In the case of two independent samples, it turns out that among the procedures taken in consideration, BOSCHLOO'S technique of raising the nominal level in the standard conditional test as far as admissible performs best in terms of power against almost all alternatives. The computational burden entailed in exact sample size calculation is comparatively modest for both the uniformly most powerful unbiased randomized and the conservative non-randomized version of the exact Fisher-type test. Computing these values yields a pair of bounds enclosing the exact sample size required for the Boschloo test, and it seems reasonable to replace the exact value with the middle of the corresponding interval. Comparisons between these mid-N estimates and the fully exact sample sizes lead to the conclusion that the extra computational effort required for obtaining the latter is mostly dispensable. This holds also true in the case of paired binary data (McNemar setting). In the latter, the level-corrected score test turns out to be almost as powerful as the randomized uniformly most powerful unbiased test and should be preferred to the McNemar–Boschloo test. The mid-N rule provides a fairly tight upper bound to the exact sample size for the score test for paired proportions.

]]>In epidemiology and clinical research, there is often a proportion of unexposed individuals resulting in zero values of exposure, meaning that some individuals are not exposed and those exposed have some continuous distribution. Examples are smoking or alcohol consumption. We will call these variables with a spike at zero (SAZ). In this paper, we performed a systematic investigation on how to model covariates with a SAZ and derived theoretical odds ratio functions for selected bivariate distributions. We consider the bivariate normal and bivariate log normal distribution with a SAZ. Both confounding and effect modification can be elegantly described by formalizing the covariance matrix given the binary outcome variable *Y*. To model the effect of these variables, we use a procedure based on fractional polynomials first introduced by Royston and Altman (1994, *Applied Statistics* 43: 429–467) and modified for the SAZ situation (Royston and Sauerbrei, 2008, *Multivariable model-building: a pragmatic approach to regression analysis based on fractional polynomials for modelling continuous variables*, Wiley; Becher *et al*., 2012, *Biometrical Journal* 54: 686–700). We aim to contribute to theory, practical procedures and application in epidemiology and clinical research to derive multivariable models for variables with a SAZ. As an example, we use data from a case–control study on lung cancer.

The kernel density estimation is a popular method in density estimation. The main issue is bandwidth selection, which is a well-known topic and is still frustrating statisticians. A robust least squares cross-validation bandwidth is proposed, which significantly improves the classical least squares cross-validation bandwidth for its variability and undersmoothing, adapts to different kinds of densities, and outperforms the existing bandwidths in statistical literature and software.

]]>This article examines volatility models for modeling and forecasting the Standard & Poor 500 (S&P 500) daily stock index returns, including the autoregressive moving average, the Taylor and Schwert generalized autoregressive conditional heteroscedasticity (GARCH), the Glosten, Jagannathan and Runkle GARCH and asymmetric power ARCH (APARCH) with the following conditional distributions: normal, Student's *t* and skewed Student's *t*-distributions. In addition, we undertake unit root (augmented Dickey–Fuller and Phillip–Perron) tests, co-integration test and error correction model. We study the stationary APARCH (*p*) model with parameters, and the uniform convergence, strong consistency and asymptotic normality are prove under simple ordered restriction. In fitting these models to S&P 500 daily stock index return data over the period 1 January 2002 to 31 December 2012, we found that the APARCH model using a skewed Student's *t*-distribution is the most effective and successful for modeling and forecasting the daily stock index returns series. The results of this study would be of great value to policy makers and investors in managing risk in stock markets trading.

A statistical test for the degree of overdispersion of count data time series based on the empirical version of the (Poisson) index of dispersion is considered. The test design relies on asymptotic properties of this index of dispersion, which in turn have been analyzed for time series stemming from a compound Poisson (Poisson-stopped sum) INAR(1) model. This approach is extended to the popular Poisson INARCH(1) model, which exhibits unconditional overdispersion but has an (equidispersed) conditional Poisson distribution. The asymptotic distribution of the index of dispersion if applied to time series stemming from such a model is derived. These results allow us to investigate the ability of the dispersion test to discriminate between Poisson INAR(1) and INARCH(1) models. Furthermore, the question is considered if the index of dispersion could be used to test the null of a Poisson INARCH(1) model against the alternative of an INARCH(1) model with additional conditional overdispersion.

]]>The codispersion coefficient quantifies the association between two spatial processes for a particular direction (spatial lag) on a two-dimensional space. When this coefficient is computed for many directions, it is useful to display those values on a single graph. In this article, we suggest a graphical tool called a codispersion map to visualize the spatial correlation between two sequences on a plane. We describe how to construct a codispersion map for regular and non-regular lattices, providing algorithms in both cases. Three numerical examples are given to illustrate how useful this map can be to detect those directions for which the codispersion coefficient attains its maximum and minimum values. We also provide the R code to construct the codispersion map in practice.

]]>Analytical bias reduction methods are developed for univariate rounded data for the first time. Extensions are given to rounding of multivariate data, and to smooth functionals of several distributions. As a by-product, we give for the first time the relation between rounded and unrounded multivariate cumulants. Estimators obtained by analytical bias reduction are compared with bootstrap and jackknife estimators by simulation.

]]>The restricted maximum likelihood is preferred by many to the full maximum likelihood for estimation with variance component and other random coefficient models, because the variance estimator is unbiased. It is shown that this unbiasedness is accompanied in some balanced designs by an inflation of the mean squared error. An estimator of the cluster-level variance that is uniformly more efficient than the full maximum likelihood is derived. Estimators of the variance ratio are also studied.

]]>In this article, we construct two likelihood-based confidence intervals (CIs) for a binomial proportion parameter using a double-sampling scheme with misclassified binary data. We utilize an easy-to-implement closed-form algorithm to obtain maximum likelihood estimators of the model parameters by maximizing the full-likelihood function. The two CIs are a naïve Wald interval and a modified Wald interval. Using simulations, we assess and compare the coverage probabilities and average widths of our two CIs. Finally, we conclude that the modified Wald interval, unlike the naïve Wald interval, produces close-to-nominal CIs under various simulations and, thus, is preferred in practice. Utilizing the expressions derived, we also illustrate our two CIs for a binomial proportion parameter using real-data example.

]]>The present penalized quantile variable selection methods are only applicable to finite number of predictors or do not have oracle property associated with estimator. This technique is considered as an alternative to ordinary least squares regression in case of the outliers and the heavy-tailed errors existing in linear models. The variable selection through quantile regression with diverging number of parameters is investigated in this paper. The convergence rate of estimator with smoothly clipped absolute deviation penalty function is also studied. Moreover, the oracle property with proper selection of tuning parameter for quantile regression under certain regularity conditions is also established. In addition, the rank correlation screening method is used to accommodate ultra-high dimensional data settings. Monte Carlo simulations demonstrate finite performance of the proposed estimator. The results of real data reveal that this approach provides substantially more information as compared with ordinary least squares, conventional quantile regression, and quantile lasso.

]]>We propose a generalization of the Binomial distribution, called DR-Binomial, which accommodates dependence among units through a model based on the dependence ratio (Ekholm *et al*., *Biometrika*, 82, 1995, 847). Properties of the DR-Binomial are discussed, and the constraints on its parameter space are studied in detail. Likelihood-based inference is presented, using both the joint and profile likelihoods; the usefulness of the DR-Binomial in applications is illustrated on a real dataset displaying negative unit-dependence, and hence under-dispersion compared with the Binomial. Although the DR-Binomial turns out to be a reparameterization of Altham's Additive-Binomial and Kupper–Haseman's Correlated-Binomial distribution, we believe its introduction is useful, both in terms of interpretability and mathematical tractability and in terms of generalizability to the Multinomial case.

In this paper, we approach the problem of shape constrained regression from a Bayesian perspective. A B-splines basis is used to model the regression function. The smoothness of the regression function is controlled by the order of the B-splines, and the shape is controlled by the shape of an associated control polygon. Controlling the shape of the control polygon reduces to some inequality constraints on the spline coefficients. Our approach enables us to take into account combinations of shape constraints and to localize each shape constraint on a given interval. The performance of our method is investigated through a simulation study. Applications to a real data sets in food industry and Global Warming are provided. © 2014 The Authors. Statistica Neerlandica © 2014 VVS.

]]>This paper presents two tests for strict exogeneity of the covariates in a correlated random effects panel data Tobit model. The tests are applied in an analysis of hours of work of US women. Estimation procedures when a model does not pass a test for strict exogeneity are discussed.

]]>Knowing the effect of the factors that can influence the variability of the equating coefficients is an important tool for the development of the linkage plans. This paper explores the effect of various factors on the variability of item response theory equating coefficients. The factors studied are the sample size, the number of common items, the length of the chain, and the possibility of averaging the equating transformations related to different paths that connect the same two forms. Both asymptotic and simulations results are provided. © 2014 The Authors. Statistica Neerlandica © 2014 VVS.

]]>The exact distribution of the sum of more than two independent beta random variables has not been known. Even in terms of approximations, only the normal approximation is known for the sum. Motivated by Murakami [Statistica Neerlandica, 2014, doi:10.1111/stan.12032], we derive here a saddlepoint approximation for the distribution of sum. An extensive simulation study shows that it always performs better than the normal approximation.

]]>This is just a sample abstract paragraph.

]]>Official statistics production based on a combination of data sources, including sample survey, census and administrative registers, is becoming more and more common. Reduction of response burden, gains of production cost efficiency as well as potentials for detailed spatial-demographic and longitudinal statistics are some of the major advantages associated with the use of integrated statistical data. Data integration has always been an essential feature associated with the use of administrative register data. But survey and census data should also be integrated, so as to widen their scope and improve the quality. There are many new and difficult challenges here that are beyond the traditional topics of survey sampling and data integration. In this article, we consider statistical theory for data integration on a conceptual level. In particular, we present a two-phase life cycle model for integrated statistical microdata, which provides a framework for the various potential error sources, and outline some concepts and topics for quality assessment beyond the ideal of error-free data. A shared understanding of these issues will hopefully help us to collocate and coordinate efforts in future research and development.

]]>No abstract is available for this article.

]]>We propose composite quantile regression for dependent data, in which the errors are from short-range dependent and strictly stationary linear processes. Under some regularity conditions, we show that composite quantile estimator enjoys root-*n* consistency and asymptotic normality. We investigate the asymptotic relative efficiency of composite quantile estimator to both single-level quantile regression and least-squares regression. When the errors have finite variance, the relative efficiency of composite quantile estimator with respect to the least-squares estimator has a universal lower bound. Under some regularity conditions, the adaptive least absolute shrinkage and selection operator penalty leads to consistent variable selection, and the asymptotic distribution of the non-zero coefficient is the same as that of the counterparts obtained when the true model is known. We conduct a simulation study and a real data analysis to evaluate the performance of the proposed approach.

This article considers the relation between total factor productivity measures for individual production units and those for aggregates such as industries, sectors or economies. This topic has been treated in a number of influential publications, such as Hulten (1978), Gollop (1979) and Jorgenson *et al.* (1987). What distinguishes this article from other publications in this area is that I deliberately avoid the making of all kinds of (neoclassical) structural and behavioural assumptions, such as the existence of production frontiers with certain properties, or optimizing behaviour of the production units. In addition, I also treat dynamic ensembles of production units, characterized by entry and exit. Thus, a greater level of generality is achieved from which the earlier results follow as special cases.

Second-order orientation methods provide a natural tool for the analysis of spatial point process data. In this paper, we extend to the spatiotemporal setting the spatial point pair orientation distribution function. The new space–time orientation distribution function is used to detect space–time anisotropic configurations. An edge-corrected estimator is defined and illustrated through a simulation study. We apply the resulting estimator to data on the spatiotemporal distribution of fire ignition events caused by humans in a square area of 30 × 30 km^{2} for 4 years. Our results confirm that our approach is able to detect directional components at distinct spatiotemporal scales. © 2014 The Authors. Statistica Neerlandica © 2014 VVS.

The asymptotic approach and Fisher's exact approach have often been used for testing the association between two dichotomous variables. The asymptotic approach may be appropriate to use in large samples but is often criticized for being associated with unacceptable high actual type I error rates for small to medium sample sizes. Fisher's exact approach suffers from conservative type I error rates and low power. For these reasons, a number of exact unconditional approaches have been proposed, which have been seen to be generally more powerful than exact conditional counterparts. We consider the traditional unconditional approach based on maximization and compare it to our presented approach, which is based on estimation and maximization. We extend the unconditional approach based on estimation and maximization to designs with the total sum fixed. The procedures based on the Pearson chi-square, Yates's corrected, and likelihood ratio test statistics are evaluated with regard to actual type I error rates and powers. A real example is used to illustrate the various testing procedures. The unconditional approach based on estimation and maximization performs well, having an actual level much closer to the nominal level. The Pearson chi-square and likelihood ratio test statistics work well with this efficient unconditional approach. This approach is generally more powerful than the other *p*-value calculation methods in the scenarios considered.