We develop statistical procedures for estimating shape and orientation of arbitrary three-dimensional particles. We focus on the case where particles cannot be observed directly, but only via sections. Volume tensors are used for describing particle shape and orientation, and we derive stereological estimators of the tensors. These estimators are combined to provide consistent estimators of the moments of the so-called particle cover density. The covariance structure associated with the particle cover density depends on the orientation and shape of the particles. For instance, if the distribution of the typical particle is invariant under rotations, then the covariance matrix is proportional to the identity matrix. We develop a non-parametric test for such isotropy. A flexible Lévy-based particle model is proposed, which may be analysed using a generalized method of moments in which the volume tensors enter. The developed methods are used to study the cell organization in the human brain cortex.

The linear regression model for right censored data, also known as the accelerated failure time model using the logarithm of survival time as the response variable, is a useful alternative to the Cox proportional hazards model. Empirical likelihood as a non-parametric approach has been demonstrated to have many desirable merits thanks to its robustness against model misspecification. However, the linear regression model with right censored data cannot directly benefit from the empirical likelihood for inferences mainly because of dependent elements in estimating equations of the conventional approach. In this paper, we propose an empirical likelihood approach with a new estimating equation for linear regression with right censored data. A nested coordinate algorithm with majorization is used for solving the optimization problems with non-differentiable objective function. We show that the Wilks' theorem holds for the new empirical likelihood. We also consider the variable selection problem with empirical likelihood when the number of predictors can be large. Because the new estimating equation is non-differentiable, a quadratic approximation is applied to study the asymptotic properties of penalized empirical likelihood. We prove the oracle properties and evaluate the properties with simulated data. We apply our method to a Surveillance, Epidemiology, and End Results small intestine cancer dataset.

We propose a new method for risk-analytic benchmark dose (BMD) estimation in a dose-response setting when the responses are measured on a continuous scale. For each dose level *d*, the observation *X*(*d*) is assumed to follow a normal distribution: . No specific parametric form is imposed upon the mean *μ*(*d*), however. Instead, nonparametric maximum likelihood estimates of *μ*(*d*) and *σ* are obtained under a monotonicity constraint on *μ*(*d*). For purposes of quantitative risk assessment, a ‘hybrid’ form of risk function is defined for any dose *d* as *R*(*d*) = *P*[*X*(*d*) < *c*], where *c* > 0 is a constant independent of *d*. The BMD is then determined by inverting the *additional risk function**R*_{A}(*d*) = *R*(*d*) − *R*(0) at some specified value of benchmark response. Asymptotic theory for the point estimators is derived, and a finite-sample study is conducted, using both real and simulated data. When a large number of doses are available, we propose an adaptive grouping method for estimating the BMD, which is shown to have optimal mean integrated squared error under appropriate designs.

Many model-free dimension reduction methods have been developed for high-dimensional regression data but have not paid much attention on problems with non-linear confounding. In this paper, we propose an inverse-regression method of dependent variable transformation for detecting the presence of non-linear confounding. The benefit of using geometrical information from our method is highlighted. A ratio estimation strategy is incorporated in our approach to enhance the interpretation of variable selection. This approach can be implemented not only in principal Hessian directions (PHD) but also in other recently developed dimension reduction methods. Several simulation examples that are reported for illustration and comparisons are made with sliced inverse regression and PHD in ignorance of non-linear confounding. An illustrative application to one real data is also presented.

Supremum score test statistics are often used to evaluate hypotheses with unidentifiable nuisance parameters under the null hypothesis. Although these statistics provide an attractive framework to address non-identifiability under the null hypothesis, little attention has been paid to their distributional properties in small to moderate sample size settings. In situations where there are identifiable nuisance parameters under the null hypothesis, these statistics may behave erratically in realistic samples as a result of a non-negligible bias induced by substituting these nuisance parameters by their estimates under the null hypothesis. In this paper, we propose an adjustment to the supremum score statistics by subtracting the expected bias from the score processes and show that this adjustment does not alter the limiting null distribution of the supremum score statistics. Using a simple example from the class of zero-inflated regression models for count data, we show empirically and theoretically that the adjusted tests are superior in terms of size and power. The practical utility of this methodology is illustrated using count data in HIV research.

Panel count data arise in many fields and a number of estimation procedures have been developed along with two procedures for variable selection. In this paper, we discuss model selection and parameter estimation together. For the former, a focused information criterion (FIC) is presented and for the latter, a frequentist model average (FMA) estimation procedure is developed. A main advantage, also the difference from the existing model selection methods, of the FIC is that it emphasizes the accuracy of the estimation of the parameters of interest, rather than all parameters. Further efficiency gain can be achieved by the FMA estimation procedure as unlike existing methods, it takes into account the variability in the stage of model selection. Asymptotic properties of the proposed estimators are established, and a simulation study conducted suggests that the proposed methods work well for practical situations. An illustrative example is also provided. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics

Partial linear models have been widely used as flexible method for modelling linear components in conjunction with non-parametric ones. Despite the presence of the non-parametric part, the linear, parametric part can under certain conditions be estimated with parametric rate. In this paper, we consider a high-dimensional linear part. We show that it can be estimated with oracle rates, using the least absolute shrinkage and selection operator penalty for the linear part and a smoothness penalty for the nonparametric part.

We extend the log-mean linear parameterization for binary data to discrete variables with arbitrary number of levels and show that also in this case it can be used to parameterize bi-directed graph models. Furthermore, we show that the log-mean linear parameterization allows one to simultaneously represent marginal independencies among variables and marginal independencies that only appear when certain levels are collapsed into a single one. We illustrate the application of this property by means of an example based on genetic association studies involving single-nucleotide polymorphisms. More generally, this feature provides a natural way to reduce the parameter count, while preserving the independence structure, by means of substantive constraints that give additional insight into the association structure of the variables. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics

We propose the Laplace Error Penalty (LEP) function for variable selection in high-dimensional regression. Unlike penalty functions using piecewise splines construction, the LEP is constructed as an exponential function with two tuning parameters and is infinitely differentiable everywhere except at the origin. With this construction, the LEP-based procedure acquires extra flexibility in variable selection, admits a unified derivative formula in optimization and is able to approximate the *L*_{0} penalty as close as possible. We show that the LEP procedure can identify relevant predictors in exponentially high-dimensional regression with normal errors. We also establish the oracle property for the LEP estimator. Although not being convex, the LEP yields a convex penalized least squares function under mild conditions if *p* is no greater than *n*. A coordinate descent majorization-minimization algorithm is introduced to implement the LEP procedure. In simulations and a real data analysis, the LEP methodology performs favorably among competitive procedures.

This paper develops a Bayesian control chart for the percentiles of the Weibull distribution, when both its in-control and out-of-control parameters are unknown. The Bayesian approach enhances parameter estimates for small sample sizes that occur when monitoring rare events such as in high-reliability applications. The chart monitors the parameters of the Weibull distribution directly, instead of transforming the data as most Weibull-based charts do in order to meet normality assumption. The chart uses accumulated knowledge resulting from the likelihood of the current sample combined with the information given by both the initial prior knowledge and all the past samples. The chart is adapting because its control limits change (e.g. narrow) during Phase I. An example is presented and good average run length properties are demonstrated.

The bootstrap variance estimate is widely used in semiparametric inferences. However, its theoretical validity is a well-known open problem. In this paper, we provide a *first* theoretical study on the bootstrap moment estimates in semiparametric models. Specifically, we establish the bootstrap moment consistency of the Euclidean parameter, which immediately implies the consistency of *t*-type bootstrap confidence set. It is worth pointing out that the only additional cost to achieve the bootstrap moment consistency in contrast with the distribution consistency is to simply strengthen the *L*_{1} maximal inequality condition required in the latter to the *L*_{p} maximal inequality condition for *p*≥1. The general *L*_{p} multiplier inequality developed in this paper is also of independent interest. These general conclusions hold for the bootstrap methods with exchangeable bootstrap weights, for example, non-parametric bootstrap and Bayesian bootstrap. Our general theory is illustrated in the celebrated Cox regression model.

We consider hypothesis testing problems for low-dimensional coefficients in a high dimensional additive hazard model. A variance reduced partial profiling estimator (VRPPE) is proposed and its asymptotic normality is established, which enables us to test the significance of each single coefficient when the data dimension is much larger than the sample size. Based on the p-values obtained from the proposed test statistics, we then apply a multiple testing procedure to identify significant coefficients and show that the false discovery rate can be controlled at the desired level. The proposed method is also extended to testing a low-dimensional sub-vector of coefficients. The finite sample performance of the proposed testing procedure is evaluated by simulation studies. We also apply it to two real data sets, with one focusing on testing low-dimensional coefficients and the other focusing on identifying significant coefficients through the proposed multiple testing procedure.

Assessing the absolute risk for a future disease event in presently healthy individuals has an important role in the primary prevention of cardiovascular diseases (CVD) and other chronic conditions. In this paper, we study the use of non-parametric Bayesian hazard regression techniques and posterior predictive inferences in the risk assessment task. We generalize our previously published Bayesian multivariate monotonic regression procedure to a survival analysis setting, combined with a computationally efficient estimation procedure utilizing case–base sampling. To achieve parsimony in the model fit, we allow for multidimensional relationships within specified subsets of risk factors, determined either on *a priori* basis or as a part of the estimation procedure. We apply the proposed methods for 10-year CVD risk assessment in a Finnish population. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics

The aim of the paper is to study the problem of estimating the quantile function of a finite population. Attention is first focused on point estimation, and asymptotic results are obtained. Confidence intervals are then constructed, based on both the following: (i) asymptotic results and (ii) a resampling technique based on rescaling the ‘usual’ bootstrap. A simulation study to compare asymptotic and resampling-based results, as well as an application to a real population, is finally performed.

We propose a new summary statistic for inhomogeneous intensity-reweighted moment stationarity spatio-temporal point processes. The statistic is defined in terms of the *n*-point correlation functions of the point process, and it generalizes the *J*-function when stationarity is assumed. We show that our statistic can be represented in terms of the generating functional and that it is related to the spatio-temporal *K*-function. We further discuss its explicit form under some specific model assumptions and derive ratio-unbiased estimators. We finally illustrate the use of our statistic in practice. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics

This paper studies the asymptotic behaviour of the false discovery and non-discovery proportions of the dynamic adaptive procedure under some dependence structure. A Bahadur-type representation of the cut point in simultaneously performing a large scale of tests is presented. The asymptotic bias decompositions of the false discovery and non-discovery proportions are given under some dependence structure. In addition to existing literatures, we find that the randomness due to the dynamic selection of the tuning parameter in estimating the true null rate serves as a source of the approximation error in the Bahadur representation and enters into the asymptotic bias term of the false discovery proportion and those of the false non-discovery proportion. The theory explains to some extent why some seemingly attractive dynamic adaptive procedures do not outperform the competing fixed adaptive procedures substantially in some situations. Simulations justify our theory and findings.

Self-regulating processes are stochastic processes whose local regularity, as measured by the pointwise Hölder exponent, is a function of amplitude. They seem to provide relevant models for various signals arising for example in geophysics or biomedicine. We propose in this work an estimator of the self-regulating function (that is, the function relating amplitude and Hölder regularity) of the self-regulating midpoint displacement process and study some of its properties. We prove that it is almost surely convergent and obtain a central limit theorem. Numerical simulations show that the estimator behaves well in practice.

We develop an easy and direct way to define and compute the fiducial distribution of a real parameter for both continuous and discrete exponential families. Furthermore, such a distribution satisfies the requirements to be considered a confidence distribution. Many examples are provided for models, which, although very simple, are widely used in applications. A characterization of the families for which the fiducial distribution coincides with a Bayesian posterior is given, and the strict connection with Jeffreys prior is shown. Asymptotic expansions of fiducial distributions are obtained without any further assumptions, and again, the relationship with the objective Bayesian analysis is pointed out. Finally, using the Edgeworth expansions, we compare the coverage of the fiducial intervals with that of other common intervals, proving the good behaviour of the former.

Length-biased and right-censored failure time data arise from many fields, and their analysis has recently attracted a great deal of attention. Two examples of the areas that often produce such data are epidemiological studies and cancer screening trials. In this paper, we discuss regression analysis of such data in the presence of missing covariates, for which no established inference procedure seems to exist. For the problem, we consider the data arising from the proportional hazards model and propose two inverse probability weighted estimation procedures. The asymptotic properties of the resulting estimators are established, and the extensive simulation study conducted for the evaluation of the proposed methods suggests that they work well for practical situations.

The problem of interest is to estimate the concentration curve and the area under the curve (AUC) by estimating the parameters of a linear regression model with an autocorrelated error process. We introduce a simple linear unbiased estimator of the concentration curve and the AUC. We show that this estimator constructed from a sampling design generated by an appropriate density is asymptotically optimal in the sense that it has exactly the same asymptotic performance as the best linear unbiased estimator. Moreover, we prove that the optimal design is robust with respect to a minimax criterion. When repeated observations are available, this estimator is consistent and has an asymptotic normal distribution. Finally, a simulated annealing algorithm is applied to a pharmacokinetic model with correlated errors.

Mixture models are commonly used in biomedical research to account for possible heterogeneity in population. In this paper, we consider tests for homogeneity between two groups in the exponential tilt mixture models. A novel pairwise pseudolikelihood approach is proposed to eliminate the unknown nuisance function. We show that the corresponding pseudolikelihood ratio test has an asymptotic distribution as a supremum of two squared Gaussian processes under the null hypothesis. To maintain the appeal of simplicity for conventional likelihood ratio tests, we propose two alternative tests, both shown to have a simple asymptotic distribution of under the null. Simulation studies show that the proposed class of pseudolikelihood ratio tests performs well in controlling type I errors and having competitive powers compared with the current tests. The proposed tests are illustrated by an example of partial differential expression detection using microarray data from prostate cancer patients.

Small area estimators in linear models are typically expressed as a convex combination of direct estimators and synthetic estimators from a suitable model. When auxiliary information used in the model is measured with error, a new estimator, accounting for the measurement error in the covariates, has been proposed in the literature. Recently, for area-level model, Ybarra & Lohr (Biometrika, 95, 2008, 919) suggested a suitable modification to the estimates of small area means based on Fay & Herriot (J. Am. Stat. Assoc., 74, 1979, 269) model where some of the covariates are measured with error. They used a frequentist approach based on the method of moments. Adopting a Bayesian approach, we propose to rewrite the measurement error model as a hierarchical model; we use improper non-informative priors on the model parameters and show, under a mild condition, that the joint posterior distribution is proper and the marginal posterior distributions of the model parameters have finite variances. We conduct a simulation study exploring different scenarios. The Bayesian predictors we propose show smaller empirical mean squared errors than the frequentist predictors of Ybarra & Lohr (Biometrika, 95, 2008, 919), and they seem also to be more stable in terms of variability and bias. We apply the proposed methodology to two real examples.

In this paper, we propose to use a special class of bivariate frailty models to study dependent censored data. The proposed models are closely linked to Archimedean copula models. We give sufficient conditions for the identifiability of this type of competing risks models. The proposed conditions are derived based on a property shared by Archimedean copula models and satisfied by several well-known bivariate frailty models. Compared with the models studied by Heckman and Honoré and Abbring and van den Berg, our models are more restrictive but can be identified with a discrete (even finite) covariate. Under our identifiability conditions, expectation–maximization (EM) algorithm provides us with consistent estimates of the unknown parameters. Simulation studies have shown that our estimation procedure works quite well. We fit a dependent censored leukaemia data set using the Clayton copula model and end our paper with some discussions. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics

In this work, we develop a method of adaptive non-parametric estimation, based on ‘warped’ kernels. The aim is to estimate a real-valued function *s* from a sample of random couples (*X*,*Y*). We deal with transformed data (Φ(*X*),*Y*), with Φ a one-to-one function, to build a collection of kernel estimators. The data-driven bandwidth selection is performed with a method inspired by Goldenshluger and Lepski (Ann. Statist., 39, 2011, 1608). The method permits to handle various problems such as additive and multiplicative regression, conditional density estimation, hazard rate estimation based on randomly right-censored data, and cumulative distribution function estimation from current-status data. The interest is threefold. First, the squared-bias/variance trade-off is automatically realized. Next, non-asymptotic risk bounds are derived. Lastly, the estimator is easily computed, thanks to its simple expression: a short simulation study is presented.

The causal assumptions, the study design and the data are the elements required for scientific inference in empirical research. The research is adequately communicated only if all of these elements and their relations are described precisely. Causal models with design describe the study design and the missing-data mechanism together with the causal structure and allow the direct application of causal calculus in the estimation of the causal effects. The flow of the study is visualized by ordering the nodes of the causal diagram in two dimensions by their causal order and the time of the observation. Conclusions on whether a causal or observational relationship can be estimated from the collected incomplete data can be made directly from the graph. Causal models with design offer a systematic and unifying view to scientific inference and increase the clarity and speed of communication. Examples on the causal models for a case–control study, a nested case–control study, a clinical trial and a two-stage case–cohort study are presented.

This paper focuses on efficient estimation, optimal rates of convergence and effective algorithms in the partly linear additive hazards regression model with current status data. We use polynomial splines to estimate both cumulative baseline hazard function with monotonicity constraint and nonparametric regression functions with no such constraint. We propose a simultaneous sieve maximum likelihood estimation for regression parameters and nuisance parameters and show that the resultant estimator of regression parameter vector is asymptotically normal and achieves the semiparametric information bound. In addition, we show that rates of convergence for the estimators of nonparametric functions are optimal. We implement the proposed estimation through a backfitting algorithm on generalized linear models. We conduct simulation studies to examine the finite-sample performance of the proposed estimation method and present an analysis of renal function recovery data for illustration.

For right-censored survival data, it is well-known that the mean survival time can be consistently estimated when the support of the censoring time contains the support of the survival time. In practice, however, this condition can be easily violated because the follow-up of a study is usually within a finite window. In this article, we show that the mean survival time is still estimable from a linear model when the support of some covariate(s) with non-zero coefficient(s) is unbounded regardless of the length of follow-up. This implies that the mean survival time can be well estimated when the support of linear predictor is wide in practice. The theoretical finding is further verified for finite samples by simulation studies. Simulations also show that, when both models are correctly specified, the linear model yields reasonable mean square prediction errors and outperforms the Cox model, particularly with heavy censoring and short follow-up time.

For many stochastic models, it is difficult to make inference about the model parameters because it is impossible to write down a tractable likelihood given the observed data. A common solution is data augmentation in a Markov chain Monte Carlo (MCMC) framework. However, there are statistical problems where this approach has proved infeasible but where simulation from the model is straightforward leading to the popularity of the approximate Bayesian computation algorithm. We introduce a forward simulation MCMC (fsMCMC) algorithm, which is primarily based upon simulation from the model. The fsMCMC algorithm formulates the simulation of the process explicitly as a data augmentation problem. By exploiting non-centred parameterizations, an efficient MCMC updating schema for the parameters and augmented data is introduced, whilst maintaining straightforward simulation from the model. The fsMCMC algorithm is successfully applied to two distinct epidemic models including a birth–death–mutation model that has only previously been analysed using approximate Bayesian computation methods.

The Cox-Aalen model, obtained by replacing the baseline hazard function in the well-known Cox model with a covariate-dependent Aalen model, allows for both fixed and dynamic covariate effects. In this paper, we examine maximum likelihood estimation for a Cox-Aalen model based on interval-censored failure times with fixed covariates. The resulting estimator globally converges to the truth slower than the parametric rate, but its finite-dimensional component is asymptotically efficient. Numerical studies show that estimation via a constrained Newton method performs well in terms of both finite sample properties and processing time for moderate-to-large samples with few covariates. We conclude with an application of the proposed methods to assess risk factors for disease progression in psoriatic arthritis.

In this paper, we consider the linear autoregressive model with varying coefficients *θ*_{n}∈[0,1). When *θ*_{n} tending to the unit root, the moderate deviation principle for empirical covariance is discussed, and as statistical applications, we provide the moderate deviation estimates of the least square and the Yule–Walker estimators of the parameter *θ*_{n}.

Menarche, the onset of menstruation, is an important maturational event of female childhood. Most of the studies of age at menarche make use of dichotomous (status quo) data. More information can be harnessed from recall data, but such data are often censored in a informative way. We show that the usual maximum likelihood estimator based on interval censored data, which ignores the informative nature of censoring, can be biased and inconsistent. We propose a parametric estimator of the menarcheal age distribution on the basis of a realistic model of the recall phenomenon. We identify the additional information contained in the recall data and demonstrate theoretically as well as through simulations the advantage of the maximum likelihood estimator based on recall data over that based on status quo data.

Technical advances in many areas have produced more complicated high-dimensional data sets than the usual high-dimensional data matrix, such as the fMRI data collected in a period for independent trials, or expression levels of genes measured in different tissues. Multiple measurements exist for each variable in each sample unit of these data. Regarding the multiple measurements as an element in a Hilbert space, we propose Principal Component Analysis (PCA) in Hilbert space. The principal components (PCs) thus defined carry information about not only the patterns of variations in individual variables but also the relationships between variables. To extract the features with greatest contributions to the explained variations in PCs for high-dimensional data, we also propose sparse PCA in Hilbert space by imposing a generalized elastic-net constraint. Efficient algorithms to solve the optimization problems in our methods are provided. We also propose a criterion for selecting the tuning parameter.

Parametrically guided non-parametric regression is an appealing method that can reduce the bias of a non-parametric regression function estimator without increasing the variance. In this paper, we adapt this method to the censored data case using an unbiased transformation of the data and a local linear fit. The asymptotic properties of the proposed estimator are established, and its performance is evaluated via finite sample simulations.

This article studies a new procedure to test for the equality of *k* regression curves in a fully non-parametric context. The test is based on the comparison of empirical estimators of the characteristic functions of the regression residuals in each population. The asymptotic behaviour of the test statistic is studied in detail. It is shown that under the null hypothesis, the distribution of the test statistic converges to a finite combination of independent chi-squared random variables with one degree of freedom. The coefficients in this linear combination can be consistently estimated. The proposed test is able to detect contiguous alternatives converging to the null at the rate *n*^{ − 1 ∕ 2}. The practical performance of the test based on the asymptotic null distribution is investigated by means of simulations.

On the basis of the idea of the Nadaraya–Watson (NW) kernel smoother and the technique of the local linear (LL) smoother, we construct the NW and LL estimators of conditional mean functions and their derivatives for a left-truncated and right-censored model. The target function includes the regression function, the conditional moment and the conditional distribution function as special cases. It is assumed that the lifetime observations with covariates form a stationary *α*-mixing sequence. Asymptotic normality of the estimators is established. Finite sample behaviour of the estimators is investigated via simulations. A real data illustration is included too.

We present a statistical methodology for fitting time-varying rankings, by estimating the strength parameters of the Plackett–Luce multiple comparisons model at regularly spaced times for each ranked item. We use the little-known method of barycentric rational interpolation to interpolate between the strength parameters so that a competitor's strength can be evaluated at any time. We chose the time-varying strengths to evolve deterministically rather than stochastically, a preference that we reason often has merit. There are many statistical and computational problems to overcome on fitting anything beyond ‘toy’ data sets. The methodological innovations here include a method for maximizing a likelihood function for many parameters, approximations for modelling tied data and an approach to the elimination of secular drift of the estimated ‘strengths’. The methodology has obvious applications to fields such as marketing, although we demonstrate our approach by analysing a large data set of golf tournament results, in search of an answer to the question ‘who is the greatest golfer of all time?’

We consider the problem of supplementing survey data with additional information from a population. The framework we use is very general; examples are missing data problems, measurement error models and combining data from multiple surveys. We do not require the survey data to be a simple random sample of the population of interest. The key assumption we make is that there exists a set of common variables between the survey and the supplementary data. Thus, the supplementary data serve the dual role of providing adjustments to the survey data for model consistencies and also enriching the survey data for improved efficiency. We propose a semi-parametric approach using empirical likelihood to combine data from the two sources. The method possesses favourable large and moderate sample properties. We use the method to investigate wage regression using data from the National Longitudinal Survey of Youth Study.

We consider the Whittle likelihood estimation of seasonal autoregressive fractionally integrated moving-average models in the presence of an additional measurement error and show that the spectral maximum Whittle likelihood estimator is asymptotically normal. We illustrate by simulation that ignoring measurement errors may result in incorrect inference. Hence, it is pertinent to test for the presence of measurement errors, which we do by developing a likelihood ratio (LR) test within the framework of Whittle likelihood. We derive the non-standard asymptotic null distribution of this LR test and the limiting distribution of LR test under a sequence of local alternatives. Because in practice, we do not know the order of the seasonal autoregressive fractionally integrated moving-average model, we consider three modifications of the LR test that takes model uncertainty into account. We study the finite sample properties of the size and the power of the LR test and its modifications. The efficacy of the proposed approach is illustrated by a real-life example.

We study semiparametric time series models with innovations following a log-concave distribution. We propose a general maximum likelihood framework that allows us to estimate simultaneously the parameters of the model and the density of the innovations. This framework can be easily adapted to many well-known models, including autoregressive moving average (ARMA), generalized autoregressive conditionally heteroscedastic (GARCH), and ARMA-GARCH models. Furthermore, we show that the estimator under our new framework is consistent in both ARMA and ARMA-GARCH settings. We demonstrate its finite sample performance via a thorough simulation study and apply it to model the daily log-return of the FTSE 100 index.

We consider classification in the situation of two groups with normally distributed data in the ‘large *p* small *n*’ framework. To counterbalance the high number of variables, we consider the thresholded independence rule. An upper bound on the classification error is established that is taylored to a mean value of interest in biological applications.

We study errors-in-variables problems when the response is binary and instrumental variables are available. We construct consistent estimators through taking advantage of the prediction relation between the unobservable variables and the instruments. The asymptotic properties of the new estimator are established and illustrated through simulation studies. We also demonstrate that the method can be readily generalized to generalized linear models and beyond. The usefulness of the method is illustrated through a real data example.

Informative dropout is a vexing problem for any biomedical study. Most existing statistical methods attempt to correct estimation bias related to this phenomenon by specifying unverifiable assumptions about the dropout mechanism. We consider a cohort study in Africa that uses an outreach programme to ascertain the vital status for dropout subjects. These data can be used to identify a number of relevant distributions. However, as only a subset of dropout subjects were followed, vital status ascertainment was incomplete. We use semi-competing risk methods as our analysis framework to address this specific case where the terminal event is incompletely ascertained and consider various procedures for estimating the marginal distribution of dropout and the marginal and conditional distributions of survival. We also consider model selection and estimation efficiency in our setting. Performance of the proposed methods is demonstrated via simulations, asymptotic study and analysis of the study data.

We study estimation and prediction in linear models where the response and the regressor variable both take values in some Hilbert space. Our main objective is to obtain consistency of a principal component-based estimator for the regression operator under minimal assumptions. In particular, we avoid some inconvenient technical restrictions that have been used throughout the literature. We develop our theory in a time-dependent setup that comprises as important special case the autoregressive Hilbertian model.

In this paper, we consider the deterministic trend model where the error process is allowed to be weakly or strongly correlated and subject to non-stationary volatility. Extant estimators of the trend coefficient are analysed. We find that under heteroskedasticity, the Cochrane–Orcutt-type estimator (with some initial condition) could be less efficient than Ordinary Least Squares (OLS) when the process is highly persistent, whereas it is asymptotically equivalent to OLS when the process is less persistent. An efficient non-parametrically weighted Cochrane–Orcutt-type estimator is then proposed. The efficiency is uniform over weak or strong serial correlation and non-stationary volatility of unknown form. The feasible estimator relies on non-parametric estimation of the volatility function, and the asymptotic theory is provided. We use the data-dependent smoothing bandwidth that can automatically adjust for the strength of non-stationarity in volatilities. The implementation does not require pretesting persistence of the process or specification of non-stationary volatility. Finite-sample evaluation via simulations and an empirical application demonstrates the good performance of proposed estimators.

This paper discusses regression analysis of current status or case I interval-censored failure time data arising from the additive hazards model. In this situation, some covariates could be missing because of various reasons, but there may exist some auxiliary information about the missing covariates. To address the problem, we propose an estimated partial likelihood approach for estimation of regression parameters, which makes use of the available auxiliary information. The method can be easily implemented, and the asymptotic properties of the resulting estimates are established. To assess the finite sample performance of the proposed method, an extensive simulation study is conducted and indicates that the method works well.

Many studies demonstrate that inference for the parameters arising in portfolio optimization often fails. The recent literature shows that this phenomenon is mainly due to a high-dimensional asset universe. Typically, such a universe refers to the asymptotics that the sample size *n* + 1 and the sample dimension *d* both go to infinity while *d* ∕ *n* *c* ∈ (0,1). In this paper, we analyze the estimators for the excess returns’ mean and variance, the weights and the Sharpe ratio of the global minimum variance portfolio under these asymptotics concerning consistency and asymptotic distribution. Problems for stating hypotheses in high dimension are also discussed. The applicability of the results is demonstrated by an empirical study. Copyright © 2014 John Wiley & Sons, Ltd.

In this paper, the asymptotic behavior of the conditional least squares estimators of the autoregressive parameters, of the mean of the innovations, and of the stability parameter for unstable integer-valued autoregressive processes of order 2 is described. The limit distributions and the scaling factors are different according to the following three cases: (i) decomposable, (ii) indecomposable but not positively regular, and (iii) positively regular models.

This work extends the integrated nested Laplace approximation (INLA) method to latent models outside the scope of latent Gaussian models, where independent components of the latent field can have a near-Gaussian distribution. The proposed methodology is an essential component of a bigger project that aims to extend the R package INLA in order to allow the user to add flexibility and challenge the Gaussian assumptions of some of the model components in a straightforward and intuitive way. Our approach is applied to two examples, and the results are compared with that obtained by Markov chain Monte Carlo, showing similar accuracy with only a small fraction of computational time. Implementation of the proposed extension is available in the R-INLA package.

Reduced *k*-means clustering is a method for clustering objects in a low-dimensional subspace. The advantage of this method is that both clustering of objects and low-dimensional subspace reflecting the cluster structure are simultaneously obtained. In this paper, the relationship between conventional *k*-means clustering and reduced *k*-means clustering is discussed. Conditions ensuring almost sure convergence of the estimator of reduced *k*-means clustering as unboundedly increasing sample size have been presented. The results for a more general model considering conventional *k*-means clustering and reduced *k*-means clustering are provided in this paper. Moreover, a consistent selection of the numbers of clusters and dimensions is described.

This paper introduces a general framework for testing hypotheses about the structure of the mean function of complex functional processes. Important particular cases of the proposed framework are as follows: (1) testing the null hypothesis that the mean of a functional process is parametric against a general alternative modelled by penalized splines; and (2) testing the null hypothesis that the means of two possibly correlated functional processes are equal or differ by only a simple parametric function. A global pseudo-likelihood ratio test is proposed, and its asymptotic distribution is derived. The size and power properties of the test are confirmed in realistic simulation scenarios. Finite-sample power results indicate that the proposed test is much more powerful than competing alternatives. Methods are applied to testing the equality between the means of normalized *δ*-power of sleep electroencephalograms of subjects with sleep-disordered breathing and matched controls.

This paper presents a non-parametric method for estimating the conditional density associated to the jump rate of a piecewise-deterministic Markov process. In our framework, the estimation needs only one observation of the process within a long time interval. Our method relies on a generalization of Aalen's multiplicative intensity model. We prove the uniform consistency of our estimator, under some reasonable assumptions related to the primitive characteristics of the process. A simulation study illustrates the behaviour of our estimator.

Approximate Bayesian computation (ABC) is a popular technique for analysing data for complex models where the likelihood function is intractable. It involves using simulation from the model to approximate the likelihood, with this approximate likelihood then being used to construct an approximate posterior. In this paper, we consider methods that estimate the parameters by maximizing the approximate likelihood used in ABC. We give a theoretical analysis of the asymptotic properties of the resulting estimator. In particular, we derive results analogous to those of consistency and asymptotic normality for standard maximum likelihood estimation. We also discuss how sequential Monte Carlo methods provide a natural method for implementing our likelihood-based ABC procedures.

In this paper, we introduce a new risk measure, the so-called conditional tail moment. It is defined as the moment of order *a* ≥ 0 of the loss distribution above the upper *α*-quantile where *α* ∈ (0,1). Estimating the conditional tail moment permits us to estimate all risk measures based on conditional moments such as conditional tail expectation, conditional value at risk or conditional tail variance. Here, we focus on the estimation of these risk measures in case of extreme losses (where *α *0 is no longer fixed). It is moreover assumed that the loss distribution is heavy tailed and depends on a covariate. The estimation method thus combines non-parametric kernel methods with extreme-value statistics. The asymptotic distribution of the estimators is established, and their finite-sample behaviour is illustrated both on simulated data and on a real data set of daily rainfalls.

Various exact tests for statistical inference are available for powerful and accurate decision rules provided that corresponding critical values are tabulated or evaluated via Monte Carlo methods. This article introduces a novel hybrid method for computing *p*-values of exact tests by combining Monte Carlo simulations and statistical tables generated *a priori*. To use the data from Monte Carlo generations and tabulated critical values jointly, we employ kernel density estimation within Bayesian-type procedures. The *p*-values are linked to the posterior means of quantiles. In this framework, we present relevant information from the Monte Carlo experiments via likelihood-type functions, whereas tabulated critical values are used to reflect prior distributions. The local maximum likelihood technique is employed to compute functional forms of prior distributions from statistical tables. Empirical likelihood functions are proposed to replace parametric likelihood functions within the structure of the posterior mean calculations to provide a Bayesian-type procedure with a distribution-free set of assumptions. We derive the asymptotic properties of the proposed nonparametric posterior means of quantiles process. Using the theoretical propositions, we calculate the minimum number of needed Monte Carlo resamples for desired level of accuracy on the basis of distances between actual data characteristics (e.g. sample sizes) and characteristics of data used to present corresponding critical values in a table. The proposed approach makes practical applications of exact tests simple and rapid. Implementations of the proposed technique are easily carried out via the recently developed STATA and R statistical packages.

In this paper, we develop a semiparametric regression model for longitudinal skewed data. In the new model, we allow the transformation function and the baseline function to be unknown. The proposed model can provide a much broader class of models than the existing additive and multiplicative models. Our estimators for regression parameters, transformation function and baseline function are asymptotically normal. Particularly, the estimator for the transformation function converges to its true value at the rate *n*^{ − 1 ∕ 2}, the convergence rate that one could expect for a parametric model. In simulation studies, we demonstrate that the proposed semiparametric method is robust with little loss of efficiency. Finally, we apply the new method to a study on longitudinal health care costs.

We investigate the effect of measurement error on principal component analysis in the high-dimensional setting. The effects of random, additive errors are characterized by the expectation and variance of the changes in the eigenvalues and eigenvectors. The results show that the impact of uncorrelated measurement error on the principal component scores is mainly in terms of increased variability and not bias. In practice, the error-induced increase in variability is small compared with the original variability for the components corresponding to the largest eigenvalues. This suggests that the impact will be negligible when these component scores are used in classification and regression or for visualizing data. However, the measurement error will contribute to a large variability in component loadings, relative to the loading values, such that interpretation based on the loadings can be difficult. The results are illustrated by simulating additive Gaussian measurement error in microarray expression data from cancer tumours and control tissues.

Latent variable models have been widely used for modelling the dependence structure of multiple outcomes data. However, the formulation of a latent variable model is often unknown *a priori*, the misspecification will distort the dependence structure and lead to unreliable model inference. Moreover, multiple outcomes with varying types present enormous analytical challenges. In this paper, we present a class of general latent variable models that can accommodate mixed types of outcomes. We propose a novel selection approach that simultaneously selects latent variables and estimates parameters. We show that the proposed estimator is consistent, asymptotically normal and has the oracle property. The practical utility of the methods is confirmed via simulations as well as an application to the analysis of the World Values Survey, a global research project that explores peoples’ values and beliefs and the social and personal characteristics that might influence them.

Consider testing multiple hypotheses using tests that can only be evaluated by simulation, such as permutation tests or bootstrap tests. This article introduces MMCTest, a sequential algorithm that gives, with arbitrarily high probability, the same classification as a specific multiple testing procedure applied to ideal *p*-values. The method can be used with a class of multiple testing procedures that include the Benjamini and Hochberg false discovery rate procedure and the Bonferroni correction controlling the familywise error rate. One of the key features of the algorithm is that it stops sampling for all the hypotheses that can already be decided as being rejected or non-rejected. MMCTest can be interrupted at any stage and then returns three sets of hypotheses: the rejected, the non-rejected and the undecided hypotheses. A simulation study motivated by actual biological data shows that MMCTest is usable in practice and that, despite the additional guarantee, it can be computationally more efficient than other methods.

We tackle an important although rarely addressed question of accounting for a variety of asymmetries frequently observed in stochastic temporal/spatial records. First, we review some measures intending to capture such asymmetries that have been introduced on various occasions in the past and then propose a family of measures that is motivated by Rice's formula for crossing level distributions of the slope. We utilize those asymmetry measures to demonstrate how a class of second-order models built on the skewed Laplace distributions can account for sample path asymmetries. It is shown that these models are capable of mimicking not only distributional skewness but also more complex geometrical asymmetries in the sample path such as tilting, front-back slope asymmetry and time irreversibility. Simple moment-based estimation techniques are briefly discussed to allow direct application to modelling and fitting actual records.

We propose a new model for multivariate Markov chains of order one or higher on the basis of the mixture transition distribution (MTD) model. We call it the MTD-Probit. The proposed model presents two attractive features: it is completely free of constraints, thereby facilitating the estimation procedure, and it is more precise at estimating the transition probabilities of a multivariate or higher-order Markov chain than the standard MTD model.

In this paper, we consider the problem of testing for a parameter change in Poisson autoregressive models. We suggest two types of cumulative sum (CUSUM) tests, namely, those based on estimates and residuals. We first demonstrate that the conditional maximum likelihood estimator (CMLE) is strongly consistent and asymptotically normal and then construct the CMLE-based CUSUM test. It is shown that under regularity conditions, its limiting null distribution is a function of independent Brownian bridges. Next, we construct the residual-based CUSUM test and derive its limiting null distribution. Simulation results are provided for illustration. A real-data analysis is performed on data for polio incidence and campylobacteriosis infections.

The aim of this article is to develop methodology for detecting influential observations in crossover models with random individual effects. Various case-weighted perturbations are performed. We obtain the influence of the perturbations on each parameter estimator and on their dispersion matrices. The obtained results exhibit the possibility to obtain closed-form expressions of the influence using the residuals in mixed linear models. Some graphical tools are also presented.

We consider the problem of estimating the proportion ** θ** of true null hypotheses in a multiple testing context. The setup is classically modelled through a semiparametric mixture with two components: a uniform distribution on interval