Mixture distributions have become a very flexible and common class of distributions, used in many different applications, but hardly any literature can be found on tests for assessing their goodness of fit. We propose two types of smooth tests of goodness of fit for mixture distributions. The first test is a genuine smooth test, and the second test makes explicit use of the mixture structure. In a simulation study the tests are compared to some traditional goodness of fit tests that, however, are not customised for mixture distributions. The first smooth test has overall good power and generally outperforms the other tests. The second smooth test is particularly suitable for assessing the fit of each component distribution separately. The tests are applicable to both continuous and discrete distributions and they are illustrated on three medical data sets.

The Cochran–Mantel–Haenszel tests are a suite of tests that are usually defined as conditional tests, tests that assume all marginal totals are known before sighting the data. Here unconditional analogues of these tests are defined for the more usual situation when the marginal totals are not known before sighting the data.

Multivariate control charts are used to monitor stochastic processes for changes and unusual observations. Hotelling's *T*^{2} statistic is calculated for each new observation and an out-of-control signal is issued if it goes beyond the control limits. However, this classical approach becomes unreliable as the number of variables *p* approaches the number of observations *n*, and impossible when *p* exceeds *n*. In this paper, we devise an improvement to the monitoring procedure in high-dimensional settings. We regularise the covariance matrix to estimate the baseline parameter and incorporate a leave-one-out re-sampling approach to estimate the empirical distribution of future observations. An extensive simulation study demonstrates that the new method outperforms the classical Hotelling *T*^{2} approach in power, and maintains appropriate false positive rates. We demonstrate the utility of the method using a set of quality control samples collected to monitor a gas chromatography–mass spectrometry apparatus over a period of 67 days.

We present methods to fit monotone polynomials in a Bayesian framework, including implementations in the popular, readily available, modeling languages BUGS and Stan. The sum-of-squared polynomials parameterisation of monotone polynomials previously considered in the frequentist framework by Murray, Müller & Turlach (2016), is again considered here, due to its superior flexibility compared to other parameterisations. The specifics of our implementation are discussed, enabling end users to adapt this work to their applications. Testing was undertaken on real and simulated data sets, the output and diagnostics of which are presented. We demonstrate that Stan is preferable for high degree polynomials, with the component-wise nature of Gibbs sampling being potentially inappropriate for such highly connected models. All code discussed here, and sample scripts that show how to use it from R, is freely available at https://github.com/hhau/BayesianMonPol.

Correlation studies are an important hypothesis-generating and testing tool, and have a wide range of applications in many scientific fields. In ecological studies in particular, multiple environmental variables are often measured in an attempt to determine relationships between chemical, physical and biological factors. For example, one may wish to know whether and how soil properties correlate with plant physiology. Although correlation coefficients are widely used, their properties and limitations are often imperfectly understood. This is especially the case when one is interested in correlations between, say, trace element content in sediments and in marine organisms, where no one-to-one correspondence exists. We show that evaluating Pearson's correlation coefficient for either site-specific means or composite samples results in biased estimates, and we propose an alternative estimator. We use simulation studies to demonstrate that our estimator generally has a much smaller bias and mean squared error. We further illustrate its use in a case study of the correlation between trace element content in sediments and in mussels in Lyttelton Harbour, New Zealand.

Linear mixed models are regularly applied to animal and plant breeding data to evaluate genetic potential. Residual maximum likelihood (REML) is the preferred method for estimating variance parameters associated with this type of model. Typically an iterative algorithm is required for the estimation of variance parameters. Two algorithms which can be used for this purpose are the expectation-maximisation (EM) algorithm and the parameter expanded EM (PX-EM) algorithm. Both, particularly the EM algorithm, can be slow to converge when compared to a Newton-Raphson type scheme such as the average information (AI) algorithm. The EM and PX-EM algorithms require specification of the complete data, including the incomplete and missing data. We consider a new incomplete data specification based on a conditional derivation of REML. We illustrate the use of the resulting new algorithm through two examples: a sire model for lamb weight data and a balanced incomplete block soybean variety trial. In the cases where the AI algorithm failed, a REML PX-EM based on the new incomplete data specification converged in 28% to 30% fewer iterations than the alternative REML PX-EM specification. For the soybean example a REML EM algorithm using the new specification converged in fewer iterations than the current standard specification of a REML PX-EM algorithm. The new specification integrates linear mixed models, Henderson's mixed model equations, REML and the REML EM algorithm into a cohesive framework.

This paper extends the ordinary quasi-symmetry (*QS*) model for square contingency tables with commensurable classification variables. The proposed generalised *QS* model is defined in terms of odds ratios that apply to ordinal variables. In particular, we present *QS* models based on global, cumulative and continuation odds ratios and discuss their properties. Finally, the conditional generalised *QS* model is introduced for local and global odds ratios. These models are illustrated through the analysis of two data sets.

We construct approximate optimal designs for minimising absolute covariances between least-squares estimators of the parameters (or linear functions of the parameters) of a linear model, thereby rendering relevant parameter estimators approximately uncorrelated with each other. In particular, we consider first the case of the covariance between two linear combinations. We also consider the case of two such covariances. For this we first set up a compound optimisation problem which we transform to one of maximising two functions of the design weights simultaneously. The approaches are formulated for a general regression model and are explored through some examples including one practical problem arising in chemistry.

Adaptive clinical trials typically involve several independent stages. The *P*-values from each stage are synthesized through a so-called combination function which ensures that the overall test will be valid if the stagewise tests are valid. In practice however, approximate and possibly invalid stagewise tests are used. This paper studies how imperfections of the stagewise tests feed through into the combination test. Several general results are proven including some for discrete models. An approximation formula which directly links the combined size accuracy to the component size accuracy is given. In the wider context of adaptive clinical trials, the main conclusion is that the basic tests used should be size accurate at nominal sizes both much larger and also much smaller than nominal desired size. For binary outcomes, the implication is that the parametric bootstrap should be used.

A joint estimation approach for multiple high-dimensional Gaussian copula graphical models is proposed, which achieves estimation robustness by exploiting non-parametric rank-based correlation coefficient estimators. Although we focus on continuous data in this paper, the proposed method can be extended to deal with binary or mixed data. Based on a weighted minimisation problem, the estimators can be obtained by implementing second-order cone programming. Theoretical properties of the procedure are investigated. We show that the proposed joint estimation procedure leads to a faster convergence rate than estimating the graphs individually. It is also shown that the proposed procedure achieves an exact graph structure recovery with probability tending to 1 under certain regularity conditions. Besides theoretical analysis, we conduct numerical simulations to compare the estimation performance and graph recovery performance of some state-of-the-art methods including both joint estimation methods and estimation methods for individuals. The proposed method is then applied to a gene expression data set, which illustrates its practical usefulness.

This paper is concerned with interval estimation for the breakpoint parameter in segmented regression. We present score-type confidence intervals derived from the score statistic itself and from the recently proposed gradient statistic. Due to lack of regularity conditions of the score, non-smoothness and non-monotonicity, naive application of the score-based statistics is unfeasible and we propose to exploit the smoothed score obtained via induced smoothing. We compare our proposals with the traditional methods based on the Wald and the likelihood ratio statistics via simulations and an analysis of a real dataset: results show that the smoothed score-like statistics perform in practice somewhat better than competitors, even when the model is not correctly specified.

A thorough practical guide to the assumptions, inference and pitfalls of Mendelian Randomisation as a method to infer causality using genetic instruments.

]]>This article is a review of a book, Analytics Methods in Sports by Severini, in a rapidly growing area, sports, of statistical application.

]]>