We study errors-in-variables problems when the response is binary and instrumental variables are available. We construct consistent estimators through taking advantage of the prediction relation between the unobservable variables and the instruments. The asymptotic properties of the new estimator are established and illustrated through simulation studies. We also demonstrate that the method can be readily generalized to generalized linear models and beyond. The usefulness of the method is illustrated through a real data example.

Informative dropout is a vexing problem for any biomedical study. Most existing statistical methods attempt to correct estimation bias related to this phenomenon by specifying unverifiable assumptions about the dropout mechanism. We consider a cohort study in Africa that uses an outreach programme to ascertain the vital status for dropout subjects. These data can be used to identify a number of relevant distributions. However, as only a subset of dropout subjects were followed, vital status ascertainment was incomplete. We use semi-competing risk methods as our analysis framework to address this specific case where the terminal event is incompletely ascertained and consider various procedures for estimating the marginal distribution of dropout and the marginal and conditional distributions of survival. We also consider model selection and estimation efficiency in our setting. Performance of the proposed methods is demonstrated via simulations, asymptotic study and analysis of the study data.

We study estimation and prediction in linear models where the response and the regressor variable both take values in some Hilbert space. Our main objective is to obtain consistency of a principal component-based estimator for the regression operator under minimal assumptions. In particular, we avoid some inconvenient technical restrictions that have been used throughout the literature. We develop our theory in a time-dependent setup that comprises as important special case the autoregressive Hilbertian model.

In this paper, we develop a semiparametric regression model for longitudinal skewed data. In the new model, we allow the transformation function and the baseline function to be unknown. The proposed model can provide a much broader class of models than the existing additive and multiplicative models. Our estimators for regression parameters, transformation function and baseline function are asymptotically normal. Particularly, the estimator for the transformation function converges to its true value at the rate *n*^{ − 1 ∕ 2}, the convergence rate that one could expect for a parametric model. In simulation studies, we demonstrate that the proposed semiparametric method is robust with little loss of efficiency. Finally, we apply the new method to a study on longitudinal health care costs.

In this paper, we consider the deterministic trend model where the error process is allowed to be weakly or strongly correlated and subject to non-stationary volatility. Extant estimators of the trend coefficient are analysed. We find that under heteroskedasticity, the Cochrane–Orcutt-type estimator (with some initial condition) could be less efficient than Ordinary Least Squares (OLS) when the process is highly persistent, whereas it is asymptotically equivalent to OLS when the process is less persistent. An efficient non-parametrically weighted Cochrane–Orcutt-type estimator is then proposed. The efficiency is uniform over weak or strong serial correlation and non-stationary volatility of unknown form. The feasible estimator relies on non-parametric estimation of the volatility function, and the asymptotic theory is provided. We use the data-dependent smoothing bandwidth that can automatically adjust for the strength of non-stationarity in volatilities. The implementation does not require pretesting persistence of the process or specification of non-stationary volatility. Finite-sample evaluation via simulations and an empirical application demonstrates the good performance of proposed estimators.

The aim of this article is to develop methodology for detecting influential observations in crossover models with random individual effects. Various case-weighted perturbations are performed. We obtain the influence of the perturbations on each parameter estimator and on their dispersion matrices. The obtained results exhibit the possibility to obtain closed-form expressions of the influence using the residuals in mixed linear models. Some graphical tools are also presented.

This paper discusses regression analysis of current status or case I interval-censored failure time data arising from the additive hazards model. In this situation, some covariates could be missing because of various reasons, but there may exist some auxiliary information about the missing covariates. To address the problem, we propose an estimated partial likelihood approach for estimation of regression parameters, which makes use of the available auxiliary information. The method can be easily implemented, and the asymptotic properties of the resulting estimates are established. To assess the finite sample performance of the proposed method, an extensive simulation study is conducted and indicates that the method works well.

In this paper, we consider the problem of testing for a parameter change in Poisson autoregressive models. We suggest two types of cumulative sum (CUSUM) tests, namely, those based on estimates and residuals. We first demonstrate that the conditional maximum likelihood estimator (CMLE) is strongly consistent and asymptotically normal and then construct the CMLE-based CUSUM test. It is shown that under regularity conditions, its limiting null distribution is a function of independent Brownian bridges. Next, we construct the residual-based CUSUM test and derive its limiting null distribution. Simulation results are provided for illustration. A real-data analysis is performed on data for polio incidence and campylobacteriosis infections.

We consider the problem of estimating the proportion ** θ** of true null hypotheses in a multiple testing context. The setup is classically modelled through a semiparametric mixture with two components: a uniform distribution on interval

We propose a new model for multivariate Markov chains of order one or higher on the basis of the mixture transition distribution (MTD) model. We call it the MTD-Probit. The proposed model presents two attractive features: it is completely free of constraints, thereby facilitating the estimation procedure, and it is more precise at estimating the transition probabilities of a multivariate or higher-order Markov chain than the standard MTD model.

We tackle an important although rarely addressed question of accounting for a variety of asymmetries frequently observed in stochastic temporal/spatial records. First, we review some measures intending to capture such asymmetries that have been introduced on various occasions in the past and then propose a family of measures that is motivated by Rice's formula for crossing level distributions of the slope. We utilize those asymmetry measures to demonstrate how a class of second-order models built on the skewed Laplace distributions can account for sample path asymmetries. It is shown that these models are capable of mimicking not only distributional skewness but also more complex geometrical asymmetries in the sample path such as tilting, front-back slope asymmetry and time irreversibility. Simple moment-based estimation techniques are briefly discussed to allow direct application to modelling and fitting actual records.

Latent variable models have been widely used for modelling the dependence structure of multiple outcomes data. However, the formulation of a latent variable model is often unknown *a priori*, the misspecification will distort the dependence structure and lead to unreliable model inference. Moreover, multiple outcomes with varying types present enormous analytical challenges. In this paper, we present a class of general latent variable models that can accommodate mixed types of outcomes. We propose a novel selection approach that simultaneously selects latent variables and estimates parameters. We show that the proposed estimator is consistent, asymptotically normal and has the oracle property. The practical utility of the methods is confirmed via simulations as well as an application to the analysis of the World Values Survey, a global research project that explores peoples’ values and beliefs and the social and personal characteristics that might influence them.

We investigate the effect of measurement error on principal component analysis in the high-dimensional setting. The effects of random, additive errors are characterized by the expectation and variance of the changes in the eigenvalues and eigenvectors. The results show that the impact of uncorrelated measurement error on the principal component scores is mainly in terms of increased variability and not bias. In practice, the error-induced increase in variability is small compared with the original variability for the components corresponding to the largest eigenvalues. This suggests that the impact will be negligible when these component scores are used in classification and regression or for visualizing data. However, the measurement error will contribute to a large variability in component loadings, relative to the loading values, such that interpretation based on the loadings can be difficult. The results are illustrated by simulating additive Gaussian measurement error in microarray expression data from cancer tumours and control tissues.

Consider testing multiple hypotheses using tests that can only be evaluated by simulation, such as permutation tests or bootstrap tests. This article introduces MMCTest, a sequential algorithm that gives, with arbitrarily high probability, the same classification as a specific multiple testing procedure applied to ideal *p*-values. The method can be used with a class of multiple testing procedures that include the Benjamini and Hochberg false discovery rate procedure and the Bonferroni correction controlling the familywise error rate. One of the key features of the algorithm is that it stops sampling for all the hypotheses that can already be decided as being rejected or non-rejected. MMCTest can be interrupted at any stage and then returns three sets of hypotheses: the rejected, the non-rejected and the undecided hypotheses. A simulation study motivated by actual biological data shows that MMCTest is usable in practice and that, despite the additional guarantee, it can be computationally more efficient than other methods.

Approximate Bayesian computation (ABC) is a popular technique for analysing data for complex models where the likelihood function is intractable. It involves using simulation from the model to approximate the likelihood, with this approximate likelihood then being used to construct an approximate posterior. In this paper, we consider methods that estimate the parameters by maximizing the approximate likelihood used in ABC. We give a theoretical analysis of the asymptotic properties of the resulting estimator. In particular, we derive results analogous to those of consistency and asymptotic normality for standard maximum likelihood estimation. We also discuss how sequential Monte Carlo methods provide a natural method for implementing our likelihood-based ABC procedures.

This paper presents a non-parametric method for estimating the conditional density associated to the jump rate of a piecewise-deterministic Markov process. In our framework, the estimation needs only one observation of the process within a long time interval. Our method relies on a generalization of Aalen's multiplicative intensity model. We prove the uniform consistency of our estimator, under some reasonable assumptions related to the primitive characteristics of the process. A simulation study illustrates the behaviour of our estimator.

Many studies demonstrate that inference for the parameters arising in portfolio optimization often fails. The recent literature shows that this phenomenon is mainly due to a high-dimensional asset universe. Typically, such a universe refers to the asymptotics that the sample size *n* + 1 and the sample dimension *d* both go to infinity while *d* ∕ *n* → *c* ∈ (0,1). In this paper, we analyze the estimators for the excess returns’ mean and variance, the weights and the Sharpe ratio of the global minimum variance portfolio under these asymptotics concerning consistency and asymptotic distribution. Problems for stating hypotheses in high dimension are also discussed. The applicability of the results is demonstrated by an empirical study. Copyright © 2014 John Wiley & Sons, Ltd.

This paper introduces a general framework for testing hypotheses about the structure of the mean function of complex functional processes. Important particular cases of the proposed framework are as follows: (1) testing the null hypothesis that the mean of a functional process is parametric against a general alternative modelled by penalized splines; and (2) testing the null hypothesis that the means of two possibly correlated functional processes are equal or differ by only a simple parametric function. A global pseudo-likelihood ratio test is proposed, and its asymptotic distribution is derived. The size and power properties of the test are confirmed in realistic simulation scenarios. Finite-sample power results indicate that the proposed test is much more powerful than competing alternatives. Methods are applied to testing the equality between the means of normalized *δ*-power of sleep electroencephalograms of subjects with sleep-disordered breathing and matched controls.

In this paper, we introduce a new risk measure, the so-called conditional tail moment. It is defined as the moment of order *a* ≥ 0 of the loss distribution above the upper *α*-quantile where *α* ∈ (0,1). Estimating the conditional tail moment permits us to estimate all risk measures based on conditional moments such as conditional tail expectation, conditional value at risk or conditional tail variance. Here, we focus on the estimation of these risk measures in case of extreme losses (where *α ↓*0 is no longer fixed). It is moreover assumed that the loss distribution is heavy tailed and depends on a covariate. The estimation method thus combines non-parametric kernel methods with extreme-value statistics. The asymptotic distribution of the estimators is established, and their finite-sample behaviour is illustrated both on simulated data and on a real data set of daily rainfalls.

Various exact tests for statistical inference are available for powerful and accurate decision rules provided that corresponding critical values are tabulated or evaluated via Monte Carlo methods. This article introduces a novel hybrid method for computing *p*-values of exact tests by combining Monte Carlo simulations and statistical tables generated *a priori*. To use the data from Monte Carlo generations and tabulated critical values jointly, we employ kernel density estimation within Bayesian-type procedures. The *p*-values are linked to the posterior means of quantiles. In this framework, we present relevant information from the Monte Carlo experiments via likelihood-type functions, whereas tabulated critical values are used to reflect prior distributions. The local maximum likelihood technique is employed to compute functional forms of prior distributions from statistical tables. Empirical likelihood functions are proposed to replace parametric likelihood functions within the structure of the posterior mean calculations to provide a Bayesian-type procedure with a distribution-free set of assumptions. We derive the asymptotic properties of the proposed nonparametric posterior means of quantiles process. Using the theoretical propositions, we calculate the minimum number of needed Monte Carlo resamples for desired level of accuracy on the basis of distances between actual data characteristics (e.g. sample sizes) and characteristics of data used to present corresponding critical values in a table. The proposed approach makes practical applications of exact tests simple and rapid. Implementations of the proposed technique are easily carried out via the recently developed STATA and R statistical packages.

We propose localized spectral estimators for the quadratic covariation and the spot covolatility of diffusion processes, which are observed discretely with additive observation noise. The appropriate estimation for time-varying volatilities is based on an asymptotic equivalence of the underlying statistical model to a white-noise model with correlation and volatility processes being constant over small time intervals. The asymptotic equivalence of the continuous-time and discrete-time experiments is proved by a construction with linear interpolation in one direction and local means for the other. The new estimator outperforms earlier non-parametric methods in the literature for the considered model. We investigate its finite sample size characteristics in simulations and draw a comparison between various proposed methods.

In this paper, we propose and study a new global test, namely, GPF test, for the one-way anova problem for functional data, obtained via globalizing the usual pointwise *F*-test. The asymptotic random expressions of the test statistic are derived, and its asymptotic power is investigated. The GPF test is shown to be root-*n* consistent. It is much less computationally intensive than a parametric bootstrap test proposed in the literature for the one-way anova for functional data. Via some simulation studies, it is found that in terms of size-controlling and power, the GPF test is comparable with two existing tests adopted for the one-way anova problem for functional data. A real data example illustrates the GPF test.

It has been shown in literature that the Lasso estimator, or *ℓ*_{1}-penalized least squares estimator, enjoys good oracle properties. This paper examines which special properties of the *ℓ*_{1}-penalty allow for sharp oracle results, and then extends the situation to general norm-based penalties that satisfy a weak decomposability condition.

In cancer diagnosis studies, high-throughput gene profiling has been extensively conducted, searching for genes whose expressions may serve as markers. Data generated from such studies have the ‘large *d*, small *n*’ feature, with the number of genes profiled much larger than the sample size. Penalization has been extensively adopted for simultaneous estimation and marker selection. Because of small sample sizes, markers identified from the analysis of single data sets can be unsatisfactory. A cost-effective remedy is to conduct integrative analysis of multiple heterogeneous data sets. In this article, we investigate composite penalization methods for estimation and marker selection in integrative analysis. The proposed methods use the minimax concave penalty (MCP) as the outer penalty. Under the homogeneity model, the ridge penalty is adopted as the inner penalty. Under the heterogeneity model, the Lasso penalty and MCP are adopted as the inner penalty. Effective computational algorithms based on coordinate descent are developed. Numerical studies, including simulation and analysis of practical cancer data sets, show satisfactory performance of the proposed methods.

This article is devoted to the construction and asymptotic study of adaptive, group-sequential, covariate-adjusted randomized clinical trials analysed through the prism of the semiparametric methodology of targeted maximum likelihood estimation. We show how to build, as the data accrue group-sequentially, a sampling design that targets a user-supplied optimal covariate-adjusted design. We also show how to carry out sound statistical inference based on such an adaptive sampling scheme (therefore extending some results known in the independent and identically distributed setting only so far), and how group-sequential testing applies on top of it. The procedure is robust (i.e. consistent even if the working model is mis-specified). A simulation study confirms the theoretical results and validates the conjecture that the procedure may also be efficient.

A simple summary of a treatment effect is attractive, which is part of the explanation of the success of the Cox model when analysing time-to-event data since the relative risk measure is such a convenient summary measure. In practice, however, the Cox model may fail to give a reasonable fit, very often because of time-changing treatment effect. The Aalen additive hazards model may be a good alternative as time-changing effects are easily modelled within this model, but results are then evidently more complicated to communicate. In such situations, the odds of concordance measure (OC) is a convenient way of communicating results, and recently Martinussen & Pipper (2012) showed how a variant of the OC measure may be estimated based on the Aalen additive hazards model. In this study, we propose an estimator that should be preferred in observational studies as it always estimates the causal effect on the chosen scale, only assuming that there are no un-measured confounders. The resulting estimator is shown to be consistent and asymptotically normal, and an estimator of its limiting variance is provided. Two real applications are provided.

For several independent multivariate bioassays performed at different laboratories or locations, the problem of testing the homogeneity of the relative potencies is addressed, assuming the usual slope-ratio or parallel line assay model. When the homogeneity hypothesis holds, interval estimation of the common relative potency is also addressed. These problems have been investigated in the literature using likelihood-based methods, under the assumption of a common covariance matrix across the different studies. This assumption is relaxed in this investigation. Numerical results show that the usual likelihood-based procedures are inaccurate for both of the above problems, in terms of providing inflated type I error probabilities for the homogeneity test, and providing coverage probabilities below the nominal level for the interval estimation of the common relative potency, unless the sample sizes are large, as expected. Correction based on small sample asymptotics is investigated in this article, and this provides significantly more accurate results in the small sample scenario. The results are also illustrated with examples.

In multistate survival analysis, the sojourn of a patient through various clinical states is shown to correspond to the diffusion of 1 C of electrical charge through an electrical network. The essential comparison has differentials of probability for the patient to correspond to differentials of charge, and it equates clinical states to electrical nodes. Indeed, if the death state of the patient corresponds to the sink node of the circuit, then the transient current that would be seen on an oscilloscope as the sink output is a plot of the probability density for the survival time of the patient. This electrical circuit analogy is further explored by considering the simplest possible survival model with two clinical states, alive and dead (sink), that incorporates censoring and truncation. The sink output seen on an oscilloscope is a plot of the Kaplan–Meier mass function. Thus, the Kaplan–Meier estimator finds motivation from the dynamics of current flow, as a fundamental physical law, rather than as a nonparametric maximum likelihood estimate (MLE). Generalization to competing risks settings with multiple death states (sinks) leads to cause-specific Kaplan–Meier submass functions as outputs at sink nodes. With covariates present, the electrical analogy provides for an intuitive understanding of partial likelihood and various baseline hazard estimates often used with the proportional hazards model.

In this paper, we propose a general class of Gamma frailty transformation models for multivariate survival data. The transformation class includes the commonly used proportional hazards and proportional odds models. The proposed class also includes a family of cure rate models. Under an improper prior for the parameters, we establish propriety of the posterior distribution. A novel Gibbs sampling algorithm is developed for sampling from the observed data posterior distribution. A simulation study is conducted to examine the properties of the proposed methodology. An application to a data set from a cord blood transplantation study is also reported.

This paper deals with the study of dependencies between two given events modelled by point processes. In particular, we focus on the context of DNA to detect favoured or avoided distances between two given motifs along a genome suggesting possible interactions at a molecular level. For this, we naturally introduce a so-called reproduction function *h* that allows to quantify the favoured positions of the motifs and that is considered as the intensity of a Poisson process. Our first interest is the estimation of this function *h* assumed to be well localized. The estimator based on random thresholds achieves an oracle inequality. Then, minimax properties of on Besov balls are established. Some simulations are provided, proving the good practical behaviour of our procedure. Finally, our method is applied to the analysis of the dependence between promoter sites and genes along the genome of the *Escherichia coli* bacterium.

We consider in this paper the semiparametric mixture of two unknown distributions equal up to a location parameter. The model is said to be semiparametric in the sense that the mixed distribution is not supposed to belong to a parametric family. To insure the identifiability of the model, it is assumed that the mixed distribution is zero symmetric, the model being then defined by the mixing proportion, two location parameters and the probability density function of the mixed distribution. We propose a new class of *M*-estimators of these parameters based on a Fourier approach and prove that they are -consistent under mild regularity conditions. Their finite sample properties are illustrated by a Monte Carlo study, and a benchmark real dataset is also studied with our method.

We study the maxiset performance of a large collection of block thresholding wavelet estimators, namely the *horizontal block thresholding family*. We provide sufficient conditions on the choices of rates and threshold values to ensure that the involved adaptive estimators obtain large maxisets. Moreover, we prove that any estimator of such a family reconstructs the Besov balls with a near-minimax optimal rate that can be faster than the one of any separable thresholding estimator. Then, we identify, in particular cases, the best estimator of such a family, that is, the one associated with the largest maxiset. As a particularity of this paper, we propose a refined approach that models method-dependent threshold values. By a series of simulation studies, we confirm the good performance of the best estimator by comparing it with the other members of its family.

In this paper, we study the problem of testing the hypothesis on whether the density *f* of a random variable on a sphere belongs to a given parametric class of densities. We propose two test statistics based on the *L*^{2} and *L*^{1} distances between a non-parametric density estimator adapted to circular data and a smoothed version of the specified density. The asymptotic distribution of the *L*^{2} test statistic is provided under the null hypothesis and contiguous alternatives. We also consider a bootstrap method to approximate the distribution of both test statistics. Through a simulation study, we explore the moderate sample performance of the proposed tests under the null hypothesis and under different alternatives. Finally, the procedure is illustrated by analysing a real data set based on wind direction measurements.