With competing risks data, one often needs to assess the treatment and covariate effects on the cumulative incidence function. Fine and Gray proposed a proportional hazards regression model for the subdistribution of a competing risk with the assumption that the censoring distribution and the covariates are independent. Covariate-dependent censoring sometimes occurs in medical studies. In this paper, we study the proportional hazards regression model for the subdistribution of a competing risk with proper adjustments for covariate-dependent censoring. We consider a covariate-adjusted weight function by fitting the Cox model for the censoring distribution and using the predictive probability for each individual. Our simulation study shows that the covariate-adjusted weight estimator is basically unbiased when the censoring time depends on the covariates, and the covariate-adjusted weight approach works well for the variance estimator as well. We illustrate our methods with bone marrow transplant data from the Center for International Blood and Marrow Transplant Research. Here, cancer relapse and death in complete remission are two competing risks.

In this paper, we consider an estimation problem of the matrix of the regression coefficients in multivariate regression models with unknown change-points. More precisely, we consider the case where the target parameter satisfies an uncertain linear restriction. Under general conditions, we propose a class of estimators that includes as special cases shrinkage estimators (SEs) and both the unrestricted and restricted estimator. We also derive a more general condition for the SEs to dominate the unrestricted estimator. To this end, we extend some results underlying the multidimensional version of the mixingale central limit theorem as well as some important identities for deriving the risk function of SEs. Finally, we present some simulation studies that corroborate the theoretical findings.

The generalized estimating equations (GEE) approach has attracted considerable interest for the analysis of correlated response data. This paper considers the model selection criterion based on the multivariate quasi-likelihood (MQL) in the GEE framework. The GEE approach is closely related to the MQL. We derive a necessary and sufficient condition for the uniqueness of the risk function based on the MQL by using properties of differential geometry. Furthermore, we establish a formal derivation of model selection criterion as an asymptotically unbiased estimator of the prediction risk under this condition, and we explicitly take into account the effect of estimating the correlation matrix used in the GEE procedure.

Hierarchical models defined by means of directed, acyclic graphs are a powerful and widely used tool for Bayesian analysis of problems of varying degrees of complexity. A simulation-based method for model criticism in such models has been suggested by O'Hagan in the form of a conflict measure based on contrasting separate local information sources about each node in the graph. This measure is however not well calibrated. In order to rectify this, alternative mutually similar tail probability-based measures have been proposed independently and have been proved to be uniformly distributed under the assumed model in quite general normal models with known covariance matrices. In the present paper, we extend this result to a variety of models. An advantage of this is that computationally costly pre-calibration schemes needed for some other suggested methods can be avoided. Another advantage is that non-informative prior distributions can be used when performing model criticism.

It is well known that adaptive sequential nonparametric estimation of differentiable functions with assigned mean integrated squared error and minimax expected stopping time is impossible. In other words, no sequential estimator can compete with an oracle estimator that knows how many derivatives an estimated curve has. Differentiable functions are typical in probability density and regression models but not in spectral density models, where considered functions are typically smoother. This paper shows that for a large class of spectral densities, which includes spectral densities of classical autoregressive moving average processes, an adaptive minimax sequential estimation with assigned mean integrated squared error is possible. Furthermore, a two-stage sequential procedure is proposed, which is minimax and adaptive to smoothness of an underlying spectral density.

In many applications, the parameters of interest are estimated by solving non-smooth estimating functions with *U*-statistic structure. Because the asymptotic covariances matrix of the estimator generally involves the underlying density function, resampling methods are often used to bypass the difficulty of non-parametric density estimation. Despite its simplicity, the resultant-covariance matrix estimator depends on the nature of resampling, and the method can be time-consuming when the number of replications is large. Furthermore, the inferences are based on the normal approximation that may not be accurate for practical sample sizes. In this paper, we propose a jackknife empirical likelihood-based inferential procedure for non-smooth estimating functions. Standard chi-square distributions are used to calculate the *p*-value and to construct confidence intervals. Extensive simulation studies and two real examples are provided to illustrate its practical utilities.

We present a unifying approach to multiple testing procedures for sequential (or streaming) data by giving sufficient conditions for a sequential multiple testing procedure to control the familywise error rate (FWER). Together, we call these conditions a ‘rejection principle for sequential tests’, which we then apply to some existing sequential multiple testing procedures to give simplified understanding of their FWER control. Next, the principle is applied to derive two new sequential multiple testing procedures with provable FWER control, one for testing hypotheses in order and another for closed testing. Examples of these new procedures are given by applying them to a chromosome aberration data set and finding the maximum safe dose of a treatment.

We develop an approach to evaluating frequentist model averaging procedures by considering them in a simple situation in which there are two-nested linear regression models over which we average. We introduce a general class of model averaged confidence intervals, obtain exact expressions for the coverage and the scaled expected length of the intervals, and use these to compute these quantities for the model averaged profile likelihood (MPI) and model-averaged tail area confidence intervals proposed by D. Fletcher and D. Turek. We show that the MPI confidence intervals can perform more poorly than the standard confidence interval used after model selection but ignoring the model selection process. The model-averaged tail area confidence intervals perform better than the MPI and postmodel-selection confidence intervals but, for the examples that we consider, offer little over simply using the standard confidence interval for *θ* under the full model, with the same nominal coverage.

We consider a general class of prior distributions for nonparametric Bayesian estimation which uses finite random series with a random number of terms. A prior is constructed through distributions on the number of basis functions and the associated coefficients. We derive a general result on adaptive posterior contraction rates for all smoothness levels of the target function in the true model by constructing an appropriate ‘sieve’ and applying the general theory of posterior contraction rates. We apply this general result on several statistical problems such as density estimation, various nonparametric regressions, classification, spectral density estimation and functional regression. The prior can be viewed as an alternative to the commonly used Gaussian process prior, but properties of the posterior distribution can be analysed by relatively simpler techniques. An interesting approximation property of B-spline basis expansion established in this paper allows a canonical choice of prior on coefficients in a random series and allows a simple computational approach without using Markov chain Monte Carlo methods. A simulation study is conducted to show that the accuracy of the Bayesian estimators based on the random series prior and the Gaussian process prior are comparable. We apply the method on Tecator data using functional regression models.

We describe a generalized linear mixed model in which all random effects may evolve over time. Random effects have a discrete support and follow a first-order Markov chain. Constraints control the size of the parameter space and possibly yield blocks of time-constant random effects. We illustrate with an application to the relationship between health education and depression in a panel of adolescents, where the random effects are highly dimensional and separately evolve over time.

This paper is concerned with studying the dependence structure between two random variables *Y*_{1} and *Y*_{2} in the presence of a covariate *X*, which affects both marginal distributions but not the dependence structure. This is reflected in the property that the conditional copula of *Y*_{1} and *Y*_{2} given *X*, does not depend on the value of *X*. This latter independence often appears as a simplifying assumption in pair-copula constructions. We introduce a general estimator for the copula in this specific setting and establish its consistency. Moreover, we consider some special cases, such as parametric or nonparametric location-scale models for the effect of the covariate *X* on the marginals of *Y*_{1} and *Y*_{2} and show that in these cases, weak convergence of the estimator, at -rate, holds. The theoretical results are illustrated by simulations and a real data example.

Recently, non-uniform sampling has been suggested in microscopy to increase efficiency. More precisely, proportional to size (PPS) sampling has been introduced, where the probability of sampling a unit in the population is proportional to the value of an auxiliary variable. In the microscopy application, the sampling units are fields of view, and the auxiliary variables are easily observed approximations to the variables of interest. Unfortunately, often some auxiliary variables vanish, that is, are zero-valued. Consequently, part of the population is inaccessible in PPS sampling. We propose a modification of the design based on a stratification idea, for which an optimal solution can be found, using a model-assisted approach. The new optimal design also applies to the case where ‘vanish’ refers to missing auxiliary variables and has independent interest in sampling theory. We verify robustness of the new approach by numerical results, and we use real data to illustrate the applicability.

We provide a consistent specification test for generalized autoregressive conditional heteroscedastic (GARCH (1,1)) models based on a test statistic of Cramér-von Mises type. Because the limit distribution of the test statistic under the null hypothesis depends on unknown quantities in a complicated manner, we propose a model-based (semiparametric) bootstrap method to approximate critical values of the test and to verify its asymptotic validity. Finally, we illuminate the finite sample behaviour of the test by some simulations.

We consider a recurrent event wherein the inter-event times are independent and identically distributed with a common absolutely continuous distribution function *F*. In this article, interest is in the problem of testing the null hypothesis that *F* belongs to some parametric family where the *q*-dimensional parameter is unknown. We propose a general Chi-squared test in which cell boundaries are data dependent. An estimator of the parameter obtained by minimizing a quadratic form resulting from a properly scaled vector of differences between Observed and Expected frequencies is used to construct the test. This estimator is known as the *minimum chi-square estimator*. Large sample properties of the proposed test statistic are established using empirical processes tools. A simulation study is conducted to assess the performance of the test under parameter misspecification, and our procedures are applied to a fleet of Boeing 720 jet planes' air conditioning system failures.

We consider the variance estimation of the weighted likelihood estimator (WLE) under two-phase stratified sampling without replacement. Asymptotic variance of the WLE in many semiparametric models contains unknown functions or does not have a closed form. The standard method of the inverse probability weighted (IPW) sample variances of an estimated influence function is then not available in these models. To address this issue, we develop the variance estimation procedure for the WLE in a general semiparametric model. The phase I variance is estimated by taking a numerical derivative of the IPW log likelihood. The phase II variance is estimated based on the bootstrap for a stratified sample in a finite population. Despite a theoretical difficulty of dependent observations due to sampling without replacement, we establish the (bootstrap) consistency of our estimators. Finite sample properties of our method are illustrated in a simulation study.

Left-truncation occurs frequently in survival studies, and it is well known how to deal with this for univariate survival times. However, there are few results on how to estimate dependence parameters and regression effects in semiparametric models for clustered survival data with delayed entry. Surprisingly, existing methods only deal with special cases. In this paper, we clarify different kinds of left-truncation and suggest estimators for semiparametric survival models under specific truncation schemes. The large-sample properties of the estimators are established. Small-sample properties are investigated via simulation studies, and the suggested estimators are used in a study of prostate cancer based on the Finnish twin cohort where a twin pair is included only if both twins were alive in 1974.

In this paper, the maximum spacing method is considered for multivariate observations. Nearest neighbour balls are used as a multidimensional analogue to univariate spacings. A class of information-type measures is used to generalize the concept of maximum spacing estimators. Weak and strong consistency of these generalized maximum spacing estimators are proved both when the assigned model class is correct and when the true density is not a member of the model class. An example of the generalized maximum spacing method in model validation context is discussed.

Estimating the fibre length distribution in composite materials is of practical relevance in materials science. We propose an estimator for the fibre length distribution using the point process of fibre endpoints as input. Assuming that this point process is a realization of a Neyman–Scott process, we use results for the reduced second moment measure to derive a consistent and unbiased estimator for the fibre length distribution. We introduce various versions of the estimator taking anisotropy or errors in the observation into account. The estimator is evaluated using a heuristic for its mean squared error as well as a simulation study. Finally, the estimator is applied to the fibre endpoint process extracted from a tomographic image of a glass fibre composite.

In this paper, we consider non-parametric copula inference under bivariate censoring. Based on an estimator of the joint cumulative distribution function, we define a discrete and two smooth estimators of the copula. The construction that we propose is valid for a large range of estimators of the distribution function and therefore for a large range of bivariate censoring frameworks. Under some conditions on the tails of the distributions, the weak convergence of the corresponding copula processes is obtained in *l*^{∞}([0,1]^{2}). We derive the uniform convergence rates of the copula density estimators deduced from our smooth copula estimators. Investigation of the practical behaviour of these estimators is performed through a simulation study and two real data applications, corresponding to different censoring settings. We use our non-parametric estimators to define a goodness-of-fit procedure for parametric copula models. A new bootstrap scheme is proposed to compute the critical values.

We discuss the problem of selecting among alternative parametric models within the Bayesian framework. For model selection problems, which involve non-nested models, the common objective choice of a prior on the model space is the uniform distribution. The same applies to situations where the models are nested. It is our contention that assigning equal prior probability to each model is over simplistic. Consequently, we introduce a novel approach to objectively determine model prior probabilities, conditionally, on the choice of priors for the parameters of the models. The idea is based on the notion of the *worth* of having each model within the selection process. At the heart of the procedure is the measure of this *worth* using the Kullback–Leibler divergence between densities from different models.

We propose a random partition model that implements prediction with many candidate covariates and interactions. The model is based on a modified product partition model that includes a regression on covariates by favouring homogeneous clusters in terms of these covariates. Additionally, the model allows for a cluster-specific choice of the covariates that are included in this evaluation of homogeneity. The variable selection is implemented by introducing a set of cluster-specific latent indicators that include or exclude covariates. The proposed model is motivated by an application to predicting mortality in an intensive care unit in Lisboa, Portugal.

We propose a flexible prior model for the parameters of binary Markov random fields (MRF), defined on rectangular lattices and with maximal cliques defined from a template maximal clique. The prior model allows higher-order interactions to be included. We also define a reversible jump Markov chain Monte Carlo algorithm to sample from the associated posterior distribution. The number of possible parameters for a higher-order MRF becomes high, even for small template maximal cliques. We define a flexible parametric form where the parameters have interpretation as potentials for clique configurations, and limit the effective number of parameters by assigning apriori discrete probabilities for events where groups of parameter values are equal. To cope with the computationally intractable normalising constant of MRFs, we adopt a previously defined approximation of binary MRFs. We demonstrate the flexibility of our prior formulation with simulated and real data examples.

In this paper, we consider a mixed compound Poisson process, that is, a random sum of independent and identically distributed (*i.i.d*.) random variables where the number of terms is a Poisson process with random intensity. We study nonparametric estimators of the jump density by specific deconvolution methods. Firstly, assuming that the random intensity has exponential distribution with unknown expectation, we propose two types of estimators based on the observation of an *i.i.d*. sample. Risks bounds and adaptive procedures are provided. Then, with no assumption on the distribution of the random intensity, we propose two non-parametric estimators of the jump density based on the joint observation of the number of jumps and the random sum of jumps. Risks bounds are provided, leading to unusual rates for one of the two estimators. The methods are implemented and compared via simulations.

In their recent work, Jiang and Yang studied six classical Likelihood Ratio Test statistics under high-dimensional setting. Assuming that a random sample of size *n* is observed from a *p*-dimensional normal population, they derive the central limit theorems (CLTs) when *p* and *n* are proportional to each other, which are different from the classical chi-square limits as *n* goes to infinity, while *p* remains fixed. In this paper, by developing a new tool, we prove that the mentioned six CLTs hold in a more applicable setting: *p* goes to infinity, and *p* can be very close to *n*. This is an almost sufficient and necessary condition for the CLTs. Simulations of histograms, comparisons on sizes and powers with those in the classical chi-square approximations and discussions are presented afterwards.

COGARCH models are continuous time versions of the well-known GARCH models of financial returns. The first aim of this paper is to show how the method of prediction-based estimating functions can be applied to draw statistical inference from observations of a COGARCH(1,1) model if the higher-order structure of the process is clarified. A second aim of the paper is to provide recursive expressions for the joint moments of any fixed order of the process. Asymptotic results are given, and a simulation study shows that the method of prediction-based estimating function outperforms the other available estimation methods.

In measurement error problems, two major and consistent estimation methods are the conditional score and the corrected score. They are functional methods that require no parametric assumptions on mismeasured covariates. The conditional score requires that a suitable sufficient statistic for the mismeasured covariate can be found, while the corrected score requires that the object score function can be estimated without bias. These assumptions limit their ranges of applications. The extensively corrected score proposed here is an extension of the corrected score. It yields consistent estimations in many cases when neither the conditional score nor the corrected score is feasible. We demonstrate its constructions in generalized linear models and the Cox proportional hazards model, assess its performances by simulation studies and illustrate its implementations by two real examples.

The paper describes a generalized iterative proportional fitting procedure that can be used for maximum likelihood estimation in a special class of the general log-linear model. The models in this class, called relational, apply to multivariate discrete sample spaces that do not necessarily have a Cartesian product structure and may not contain an overall effect. When applied to the cell probabilities, the models without the overall effect are curved exponential families and the values of the sufficient statistics are reproduced by the MLE only up to a constant of proportionality. The paper shows that Iterative Proportional Fitting, Generalized Iterative Scaling, and Improved Iterative Scaling fail to work for such models. The algorithm proposed here is based on iterated Bregman projections. As a by-product, estimates of the multiplicative parameters are also obtained. An implementation of the algorithm is available as an R-package.

This work provides a class of non-Gaussian spatial Matérn fields which are useful for analysing geostatistical data. The models are constructed as solutions to stochastic partial differential equations driven by generalized hyperbolic noise and are incorporated in a standard geostatistical setting with irregularly spaced observations, measurement errors and covariates. A maximum likelihood estimation technique based on the Monte Carlo expectation-maximization algorithm is presented, and a Monte Carlo method for spatial prediction is derived. Finally, an application to precipitation data is presented, and the performance of the non-Gaussian models is compared with standard Gaussian and transformed Gaussian models through cross-validation.

We present a novel methodology for estimating the parameters of a finite mixture model (FMM) based on partially rank-ordered set (PROS) sampling and use it in a fishery application. A PROS sampling design first selects a simple random sample of fish and creates partially rank-ordered judgement subsets by dividing units into subsets of prespecified sizes. The final measurements are then obtained from these partially ordered judgement subsets. The traditional expectation–maximization algorithm is not directly applicable for these observations. We propose a suitable expectation–maximization algorithm to estimate the parameters of the FMMs based on PROS samples. We also study the problem of classification of the PROS sample into the components of the FMM. We show that the maximum likelihood estimators based on PROS samples perform substantially better than their simple random sample counterparts even with small samples. The results are used to classify a fish population using the length-frequency data.

The particle Gibbs sampler is a systematic way of using a particle filter within Markov chain Monte Carlo. This results in an off-the-shelf Markov kernel on the space of state trajectories, which can be used to simulate from the full joint smoothing distribution for a state space model in a Markov chain Monte Carlo scheme. We show that the particle Gibbs Markov kernel is uniformly ergodic under rather general assumptions, which we will carefully review and discuss. In particular, we provide an explicit rate of convergence, which reveals that (i) for fixed number of data points, the convergence rate can be made arbitrarily good by increasing the number of particles and (ii) under general mixing assumptions, the convergence rate can be kept constant by increasing the number of particles superlinearly with the number of observations. We illustrate the applicability of our result by studying in detail a common stochastic volatility model with a non-compact state space.

We develop statistical procedures for estimating shape and orientation of arbitrary three-dimensional particles. We focus on the case where particles cannot be observed directly, but only via sections. Volume tensors are used for describing particle shape and orientation, and we derive stereological estimators of the tensors. These estimators are combined to provide consistent estimators of the moments of the so-called particle cover density. The covariance structure associated with the particle cover density depends on the orientation and shape of the particles. For instance, if the distribution of the typical particle is invariant under rotations, then the covariance matrix is proportional to the identity matrix. We develop a non-parametric test for such isotropy. A flexible Lévy-based particle model is proposed, which may be analysed using a generalized method of moments in which the volume tensors enter. The developed methods are used to study the cell organization in the human brain cortex.

The linear regression model for right censored data, also known as the accelerated failure time model using the logarithm of survival time as the response variable, is a useful alternative to the Cox proportional hazards model. Empirical likelihood as a non-parametric approach has been demonstrated to have many desirable merits thanks to its robustness against model misspecification. However, the linear regression model with right censored data cannot directly benefit from the empirical likelihood for inferences mainly because of dependent elements in estimating equations of the conventional approach. In this paper, we propose an empirical likelihood approach with a new estimating equation for linear regression with right censored data. A nested coordinate algorithm with majorization is used for solving the optimization problems with non-differentiable objective function. We show that the Wilks' theorem holds for the new empirical likelihood. We also consider the variable selection problem with empirical likelihood when the number of predictors can be large. Because the new estimating equation is non-differentiable, a quadratic approximation is applied to study the asymptotic properties of penalized empirical likelihood. We prove the oracle properties and evaluate the properties with simulated data. We apply our method to a Surveillance, Epidemiology, and End Results small intestine cancer dataset.

We propose a new method for risk-analytic benchmark dose (BMD) estimation in a dose-response setting when the responses are measured on a continuous scale. For each dose level *d*, the observation *X*(*d*) is assumed to follow a normal distribution: . No specific parametric form is imposed upon the mean *μ*(*d*), however. Instead, nonparametric maximum likelihood estimates of *μ*(*d*) and *σ* are obtained under a monotonicity constraint on *μ*(*d*). For purposes of quantitative risk assessment, a ‘hybrid’ form of risk function is defined for any dose *d* as *R*(*d*) = *P*[*X*(*d*) < *c*], where *c* > 0 is a constant independent of *d*. The BMD is then determined by inverting the *additional risk function**R*_{A}(*d*) = *R*(*d*) − *R*(0) at some specified value of benchmark response. Asymptotic theory for the point estimators is derived, and a finite-sample study is conducted, using both real and simulated data. When a large number of doses are available, we propose an adaptive grouping method for estimating the BMD, which is shown to have optimal mean integrated squared error under appropriate designs.

Many model-free dimension reduction methods have been developed for high-dimensional regression data but have not paid much attention on problems with non-linear confounding. In this paper, we propose an inverse-regression method of dependent variable transformation for detecting the presence of non-linear confounding. The benefit of using geometrical information from our method is highlighted. A ratio estimation strategy is incorporated in our approach to enhance the interpretation of variable selection. This approach can be implemented not only in principal Hessian directions (PHD) but also in other recently developed dimension reduction methods. Several simulation examples that are reported for illustration and comparisons are made with sliced inverse regression and PHD in ignorance of non-linear confounding. An illustrative application to one real data is also presented.

Supremum score test statistics are often used to evaluate hypotheses with unidentifiable nuisance parameters under the null hypothesis. Although these statistics provide an attractive framework to address non-identifiability under the null hypothesis, little attention has been paid to their distributional properties in small to moderate sample size settings. In situations where there are identifiable nuisance parameters under the null hypothesis, these statistics may behave erratically in realistic samples as a result of a non-negligible bias induced by substituting these nuisance parameters by their estimates under the null hypothesis. In this paper, we propose an adjustment to the supremum score statistics by subtracting the expected bias from the score processes and show that this adjustment does not alter the limiting null distribution of the supremum score statistics. Using a simple example from the class of zero-inflated regression models for count data, we show empirically and theoretically that the adjusted tests are superior in terms of size and power. The practical utility of this methodology is illustrated using count data in HIV research.

Panel count data arise in many fields and a number of estimation procedures have been developed along with two procedures for variable selection. In this paper, we discuss model selection and parameter estimation together. For the former, a focused information criterion (FIC) is presented and for the latter, a frequentist model average (FMA) estimation procedure is developed. A main advantage, also the difference from the existing model selection methods, of the FIC is that it emphasizes the accuracy of the estimation of the parameters of interest, rather than all parameters. Further efficiency gain can be achieved by the FMA estimation procedure as unlike existing methods, it takes into account the variability in the stage of model selection. Asymptotic properties of the proposed estimators are established, and a simulation study conducted suggests that the proposed methods work well for practical situations. An illustrative example is also provided. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics

We propose the Laplace Error Penalty (LEP) function for variable selection in high-dimensional regression. Unlike penalty functions using piecewise splines construction, the LEP is constructed as an exponential function with two tuning parameters and is infinitely differentiable everywhere except at the origin. With this construction, the LEP-based procedure acquires extra flexibility in variable selection, admits a unified derivative formula in optimization and is able to approximate the *L*_{0} penalty as close as possible. We show that the LEP procedure can identify relevant predictors in exponentially high-dimensional regression with normal errors. We also establish the oracle property for the LEP estimator. Although not being convex, the LEP yields a convex penalized least squares function under mild conditions if *p* is no greater than *n*. A coordinate descent majorization-minimization algorithm is introduced to implement the LEP procedure. In simulations and a real data analysis, the LEP methodology performs favorably among competitive procedures.

This paper develops a Bayesian control chart for the percentiles of the Weibull distribution, when both its in-control and out-of-control parameters are unknown. The Bayesian approach enhances parameter estimates for small sample sizes that occur when monitoring rare events such as in high-reliability applications. The chart monitors the parameters of the Weibull distribution directly, instead of transforming the data as most Weibull-based charts do in order to meet normality assumption. The chart uses accumulated knowledge resulting from the likelihood of the current sample combined with the information given by both the initial prior knowledge and all the past samples. The chart is adapting because its control limits change (e.g. narrow) during Phase I. An example is presented and good average run length properties are demonstrated.

The bootstrap variance estimate is widely used in semiparametric inferences. However, its theoretical validity is a well-known open problem. In this paper, we provide a *first* theoretical study on the bootstrap moment estimates in semiparametric models. Specifically, we establish the bootstrap moment consistency of the Euclidean parameter, which immediately implies the consistency of *t*-type bootstrap confidence set. It is worth pointing out that the only additional cost to achieve the bootstrap moment consistency in contrast with the distribution consistency is to simply strengthen the *L*_{1} maximal inequality condition required in the latter to the *L*_{p} maximal inequality condition for *p*≥1. The general *L*_{p} multiplier inequality developed in this paper is also of independent interest. These general conclusions hold for the bootstrap methods with exchangeable bootstrap weights, for example, non-parametric bootstrap and Bayesian bootstrap. Our general theory is illustrated in the celebrated Cox regression model.

We consider hypothesis testing problems for low-dimensional coefficients in a high dimensional additive hazard model. A variance reduced partial profiling estimator (VRPPE) is proposed and its asymptotic normality is established, which enables us to test the significance of each single coefficient when the data dimension is much larger than the sample size. Based on the p-values obtained from the proposed test statistics, we then apply a multiple testing procedure to identify significant coefficients and show that the false discovery rate can be controlled at the desired level. The proposed method is also extended to testing a low-dimensional sub-vector of coefficients. The finite sample performance of the proposed testing procedure is evaluated by simulation studies. We also apply it to two real data sets, with one focusing on testing low-dimensional coefficients and the other focusing on identifying significant coefficients through the proposed multiple testing procedure.

We give a simple proof of Bell's inequality in quantum mechanics using theory from causal interaction, which, in conjunction with experiments, demonstrates that the local hidden variable assumption is false. The proof sheds light on relationships between the notion of causal interaction and interference between treatments.

In this work, we develop a method of adaptive non-parametric estimation, based on ‘warped’ kernels. The aim is to estimate a real-valued function *s* from a sample of random couples (*X*,*Y*). We deal with transformed data (Φ(*X*),*Y*), with Φ a one-to-one function, to build a collection of kernel estimators. The data-driven bandwidth selection is performed with a method inspired by Goldenshluger and Lepski (Ann. Statist., 39, 2011, 1608). The method permits to handle various problems such as additive and multiplicative regression, conditional density estimation, hazard rate estimation based on randomly right-censored data, and cumulative distribution function estimation from current-status data. The interest is threefold. First, the squared-bias/variance trade-off is automatically realized. Next, non-asymptotic risk bounds are derived. Lastly, the estimator is easily computed, thanks to its simple expression: a short simulation study is presented.

The causal assumptions, the study design and the data are the elements required for scientific inference in empirical research. The research is adequately communicated only if all of these elements and their relations are described precisely. Causal models with design describe the study design and the missing-data mechanism together with the causal structure and allow the direct application of causal calculus in the estimation of the causal effects. The flow of the study is visualized by ordering the nodes of the causal diagram in two dimensions by their causal order and the time of the observation. Conclusions on whether a causal or observational relationship can be estimated from the collected incomplete data can be made directly from the graph. Causal models with design offer a systematic and unifying view to scientific inference and increase the clarity and speed of communication. Examples on the causal models for a case–control study, a nested case–control study, a clinical trial and a two-stage case–cohort study are presented.

For many stochastic models, it is difficult to make inference about the model parameters because it is impossible to write down a tractable likelihood given the observed data. A common solution is data augmentation in a Markov chain Monte Carlo (MCMC) framework. However, there are statistical problems where this approach has proved infeasible but where simulation from the model is straightforward leading to the popularity of the approximate Bayesian computation algorithm. We introduce a forward simulation MCMC (fsMCMC) algorithm, which is primarily based upon simulation from the model. The fsMCMC algorithm formulates the simulation of the process explicitly as a data augmentation problem. By exploiting non-centred parameterizations, an efficient MCMC updating schema for the parameters and augmented data is introduced, whilst maintaining straightforward simulation from the model. The fsMCMC algorithm is successfully applied to two distinct epidemic models including a birth–death–mutation model that has only previously been analysed using approximate Bayesian computation methods.

For right-censored survival data, it is well-known that the mean survival time can be consistently estimated when the support of the censoring time contains the support of the survival time. In practice, however, this condition can be easily violated because the follow-up of a study is usually within a finite window. In this article, we show that the mean survival time is still estimable from a linear model when the support of some covariate(s) with non-zero coefficient(s) is unbounded regardless of the length of follow-up. This implies that the mean survival time can be well estimated when the support of linear predictor is wide in practice. The theoretical finding is further verified for finite samples by simulation studies. Simulations also show that, when both models are correctly specified, the linear model yields reasonable mean square prediction errors and outperforms the Cox model, particularly with heavy censoring and short follow-up time.

The Cox-Aalen model, obtained by replacing the baseline hazard function in the well-known Cox model with a covariate-dependent Aalen model, allows for both fixed and dynamic covariate effects. In this paper, we examine maximum likelihood estimation for a Cox-Aalen model based on interval-censored failure times with fixed covariates. The resulting estimator globally converges to the truth slower than the parametric rate, but its finite-dimensional component is asymptotically efficient. Numerical studies show that estimation via a constrained Newton method performs well in terms of both finite sample properties and processing time for moderate-to-large samples with few covariates. We conclude with an application of the proposed methods to assess risk factors for disease progression in psoriatic arthritis.

In this paper, we propose to use a special class of bivariate frailty models to study dependent censored data. The proposed models are closely linked to Archimedean copula models. We give sufficient conditions for the identifiability of this type of competing risks models. The proposed conditions are derived based on a property shared by Archimedean copula models and satisfied by several well-known bivariate frailty models. Compared with the models studied by Heckman and Honoré and Abbring and van den Berg, our models are more restrictive but can be identified with a discrete (even finite) covariate. Under our identifiability conditions, expectation–maximization (EM) algorithm provides us with consistent estimates of the unknown parameters. Simulation studies have shown that our estimation procedure works quite well. We fit a dependent censored leukaemia data set using the Clayton copula model and end our paper with some discussions. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics

Length-biased and right-censored failure time data arise from many fields, and their analysis has recently attracted a great deal of attention. Two examples of the areas that often produce such data are epidemiological studies and cancer screening trials. In this paper, we discuss regression analysis of such data in the presence of missing covariates, for which no established inference procedure seems to exist. For the problem, we consider the data arising from the proportional hazards model and propose two inverse probability weighted estimation procedures. The asymptotic properties of the resulting estimators are established, and the extensive simulation study conducted for the evaluation of the proposed methods suggests that they work well for practical situations.

The problem of interest is to estimate the concentration curve and the area under the curve (AUC) by estimating the parameters of a linear regression model with an autocorrelated error process. We introduce a simple linear unbiased estimator of the concentration curve and the AUC. We show that this estimator constructed from a sampling design generated by an appropriate density is asymptotically optimal in the sense that it has exactly the same asymptotic performance as the best linear unbiased estimator. Moreover, we prove that the optimal design is robust with respect to a minimax criterion. When repeated observations are available, this estimator is consistent and has an asymptotic normal distribution. Finally, a simulated annealing algorithm is applied to a pharmacokinetic model with correlated errors.

We develop an easy and direct way to define and compute the fiducial distribution of a real parameter for both continuous and discrete exponential families. Furthermore, such a distribution satisfies the requirements to be considered a confidence distribution. Many examples are provided for models, which, although very simple, are widely used in applications. A characterization of the families for which the fiducial distribution coincides with a Bayesian posterior is given, and the strict connection with Jeffreys prior is shown. Asymptotic expansions of fiducial distributions are obtained without any further assumptions, and again, the relationship with the objective Bayesian analysis is pointed out. Finally, using the Edgeworth expansions, we compare the coverage of the fiducial intervals with that of other common intervals, proving the good behaviour of the former.

Self-regulating processes are stochastic processes whose local regularity, as measured by the pointwise Hölder exponent, is a function of amplitude. They seem to provide relevant models for various signals arising for example in geophysics or biomedicine. We propose in this work an estimator of the self-regulating function (that is, the function relating amplitude and Hölder regularity) of the self-regulating midpoint displacement process and study some of its properties. We prove that it is almost surely convergent and obtain a central limit theorem. Numerical simulations show that the estimator behaves well in practice.

Mixture models are commonly used in biomedical research to account for possible heterogeneity in population. In this paper, we consider tests for homogeneity between two groups in the exponential tilt mixture models. A novel pairwise pseudolikelihood approach is proposed to eliminate the unknown nuisance function. We show that the corresponding pseudolikelihood ratio test has an asymptotic distribution as a supremum of two squared Gaussian processes under the null hypothesis. To maintain the appeal of simplicity for conventional likelihood ratio tests, we propose two alternative tests, both shown to have a simple asymptotic distribution of under the null. Simulation studies show that the proposed class of pseudolikelihood ratio tests performs well in controlling type I errors and having competitive powers compared with the current tests. The proposed tests are illustrated by an example of partial differential expression detection using microarray data from prostate cancer patients.

Small area estimators in linear models are typically expressed as a convex combination of direct estimators and synthetic estimators from a suitable model. When auxiliary information used in the model is measured with error, a new estimator, accounting for the measurement error in the covariates, has been proposed in the literature. Recently, for area-level model, Ybarra & Lohr (Biometrika, 95, 2008, 919) suggested a suitable modification to the estimates of small area means based on Fay & Herriot (J. Am. Stat. Assoc., 74, 1979, 269) model where some of the covariates are measured with error. They used a frequentist approach based on the method of moments. Adopting a Bayesian approach, we propose to rewrite the measurement error model as a hierarchical model; we use improper non-informative priors on the model parameters and show, under a mild condition, that the joint posterior distribution is proper and the marginal posterior distributions of the model parameters have finite variances. We conduct a simulation study exploring different scenarios. The Bayesian predictors we propose show smaller empirical mean squared errors than the frequentist predictors of Ybarra & Lohr (Biometrika, 95, 2008, 919), and they seem also to be more stable in terms of variability and bias. We apply the proposed methodology to two real examples.

This paper studies the asymptotic behaviour of the false discovery and non-discovery proportions of the dynamic adaptive procedure under some dependence structure. A Bahadur-type representation of the cut point in simultaneously performing a large scale of tests is presented. The asymptotic bias decompositions of the false discovery and non-discovery proportions are given under some dependence structure. In addition to existing literatures, we find that the randomness due to the dynamic selection of the tuning parameter in estimating the true null rate serves as a source of the approximation error in the Bahadur representation and enters into the asymptotic bias term of the false discovery proportion and those of the false non-discovery proportion. The theory explains to some extent why some seemingly attractive dynamic adaptive procedures do not outperform the competing fixed adaptive procedures substantially in some situations. Simulations justify our theory and findings.

The aim of the paper is to study the problem of estimating the quantile function of a finite population. Attention is first focused on point estimation, and asymptotic results are obtained. Confidence intervals are then constructed, based on both the following: (i) asymptotic results and (ii) a resampling technique based on rescaling the ‘usual’ bootstrap. A simulation study to compare asymptotic and resampling-based results, as well as an application to a real population, is finally performed.

We propose a new summary statistic for inhomogeneous intensity-reweighted moment stationarity spatio-temporal point processes. The statistic is defined in terms of the *n*-point correlation functions of the point process, and it generalizes the *J*-function when stationarity is assumed. We show that our statistic can be represented in terms of the generating functional and that it is related to the spatio-temporal *K*-function. We further discuss its explicit form under some specific model assumptions and derive ratio-unbiased estimators. We finally illustrate the use of our statistic in practice. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics

Partial linear models have been widely used as flexible method for modelling linear components in conjunction with non-parametric ones. Despite the presence of the non-parametric part, the linear, parametric part can under certain conditions be estimated with parametric rate. In this paper, we consider a high-dimensional linear part. We show that it can be estimated with oracle rates, using the least absolute shrinkage and selection operator penalty for the linear part and a smoothness penalty for the nonparametric part.

Assessing the absolute risk for a future disease event in presently healthy individuals has an important role in the primary prevention of cardiovascular diseases (CVD) and other chronic conditions. In this paper, we study the use of non-parametric Bayesian hazard regression techniques and posterior predictive inferences in the risk assessment task. We generalize our previously published Bayesian multivariate monotonic regression procedure to a survival analysis setting, combined with a computationally efficient estimation procedure utilizing case–base sampling. To achieve parsimony in the model fit, we allow for multidimensional relationships within specified subsets of risk factors, determined either on *a priori* basis or as a part of the estimation procedure. We apply the proposed methods for 10-year CVD risk assessment in a Finnish population. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics

We extend the log-mean linear parameterization for binary data to discrete variables with arbitrary number of levels and show that also in this case it can be used to parameterize bi-directed graph models. Furthermore, we show that the log-mean linear parameterization allows one to simultaneously represent marginal independencies among variables and marginal independencies that only appear when certain levels are collapsed into a single one. We illustrate the application of this property by means of an example based on genetic association studies involving single-nucleotide polymorphisms. More generally, this feature provides a natural way to reduce the parameter count, while preserving the independence structure, by means of substantive constraints that give additional insight into the association structure of the variables. © 2014 Board of the Foundation of the Scandinavian Journal of Statistics