The paper proposes a new test for detecting the umbrella pattern under a general non-parametric scheme. The alternative asserts that the umbrella ordering holds while the hypothesis is its complement. The main focus is put on controlling the power function of the test outside the alternative. As a result, the asymptotic error of the first kind of the constructed solution is smaller than or equal to the fixed significance level *α* on the whole set where the umbrella ordering does not hold. Also, under finite sample sizes, this error is controlled to a satisfactory extent. A simulation study shows, among other things, that the new test improves upon the solution widely recommended in the literature of the subject. A routine, written in R, is attached as the Supporting Information file.

For multivariate survival data, we study the generalized method of moments (GMM) approach to estimation and inference based on the marginal additive hazards model. We propose an efficient iterative algorithm using closed-form solutions, which dramatically reduces the computational burden. Asymptotic normality of the proposed estimators is established, and the corresponding variance–covariance matrix can be consistently estimated. Inference procedures are derived based on the asymptotic chi-squared distribution of the GMM objective function. Simulation studies are conducted to empirically examine the finite sample performance of the proposed method, and a real data example from a dental study is used for illustration.

An extended single-index model is considered when responses are missing at random. A three-step estimation procedure is developed to define an estimator for the single-index parameter vector by a joint estimating equation. The proposed estimator is shown to be asymptotically normal. An algorithm for computing this estimator is proposed. This algorithm only involves one-dimensional nonparametric smoothers, thereby avoiding the data sparsity problem caused by high model dimensionality. Some simulation studies are conducted to investigate the finite sample performances of the proposed estimators.

Log-normal linear regression models are popular in many fields of research. Bayesian estimation of the conditional mean of the dependent variable is problematic as many choices of the prior for the variance (on the log-scale) lead to posterior distributions with no finite moments. We propose a generalized inverse Gaussian prior for this variance and derive the conditions on the prior parameters that yield posterior distributions of the conditional mean of the dependent variable with finite moments up to a pre-specified order. The conditions depend on one of the three parameters of the suggested prior; the other two have an influence on inferences for small and medium sample sizes. A second goal of this paper is to discuss how to choose these parameters according to different criteria including the optimization of frequentist properties of posterior means.

Influential units occur frequently in surveys, especially in business surveys that collect economic variables whose distributions are highly skewed. A unit is said to be influential when its inclusion or exclusion from the sample has an important impact on the sampling error of estimates. We extend the concept of conditional bias attached to a unit and propose a robust version of the double expansion estimator, which depends on a tuning constant. We determine the tuning constant that minimizes the maximum estimated conditional bias. Our results can be naturally extended to the case of unit nonresponse, the set of respondents often being viewed as a second-phase sample. A robust version of calibration estimators, based on auxiliary information available at both phases, is also constructed.

We are concerned with a situation in which we would like to test multiple hypotheses with tests whose *p*-values cannot be computed explicitly but can be approximated using Monte Carlo simulation. This scenario occurs widely in practice. We are interested in obtaining the same rejections and non-rejections as the ones obtained if the *p*-values for all hypotheses had been available. The present article introduces a framework for this scenario by providing a generic algorithm for a general multiple testing procedure. We establish conditions that guarantee that the rejections and non-rejections obtained through Monte Carlo simulations are identical to the ones obtained with the *p*-values. Our framework is applicable to a general class of step-up and step-down procedures, which includes many established multiple testing corrections such as the ones of Bonferroni, Holm, Sidak, Hochberg or Benjamini–Hochberg. Moreover, we show how to use our framework to improve algorithms available in the literature in such a way as to yield theoretical guarantees on their results. These modifications can easily be implemented in practice and lead to a particular way of reporting multiple testing results as three sets together with an error bound on their correctness, demonstrated exemplarily using a real biological dataset.

Linear structural equation models, which relate random variables via linear interdependencies and Gaussian noise, are a popular tool for modelling multivariate joint distributions. The models correspond to mixed graphs that include both directed and bidirected edges representing the linear relationships and correlations between noise terms, respectively. A question of interest for these models is that of parameter identifiability, whether or not it is possible to recover edge coefficients from the joint covariance matrix of the random variables. For the problem of determining generic parameter identifiability, we present an algorithm building upon the half-trek criterion. Underlying our new algorithm is the idea that ancestral subsets of vertices in the graph can be used to extend the applicability of a decomposition technique.

Compositional tables – a continuous counterpart to the contingency tables – carry relative information about relationships between row and column factors; thus, for their analysis, only ratios between cells of a table are informative. Consequently, the standard Euclidean geometry should be replaced by the Aitchison geometry on the simplex that enables decomposition of the table into its independent and interactive parts. The aim of the paper is to find interpretable coordinate representation for independent and interaction tables (in sense of balances and odds ratios of cells, respectively), where further statistical processing of compositional tables can be performed. Theoretical results are applied to real-world problems from a health survey and in macroeconomics.

Frame corrections have been studied in census applications for a long time. One very promising method is *dual system estimation*, which is based on capture–recapture models. These methods have been applied recently in the USA, England, Israel and Switzerland. In order to gain information on subgroups of the population, structure preserving estimators can be applied [i.e. structure preserving estimation (SPREE) and generalized SPREE]. The present paper extends the SPREE approach with an alternative distance function, the chi-square. The new method has shown improved estimates in our application with very small domains. A comparative study based on a large-scale Monte Carlo simulation elaborates on advantages and disadvantages of the estimators in the context of the German register-assisted Census 2011.

We find the asymptotic distribution of the multi-dimensional multi-scale and kernel estimators for high-frequency financial data with microstructure. Sampling times are allowed to be asynchronous and endogenous. In the process, we show that the classes of multi-scale and kernel estimators for smoothing noise perturbation are asymptotically equivalent in the sense of having the same asymptotic distribution for corresponding kernel and weight functions. The theory leads to multi-dimensional stable central limit theorems and feasible versions. Hence, they allow to draw statistical inference for a broad class of multivariate models, which paves the way to tests and confidence intervals in risk measurement for arbitrary portfolios composed of high-frequently observed assets. As an application, we enhance the approach to construct a test for investigating hypotheses that correlated assets are independent conditional on a common factor.

Linear increments (LI) are used to analyse repeated outcome data with missing values. Previously, two LI methods have been proposed, one allowing non-monotone missingness but not independent measurement error and one allowing independent measurement error but only monotone missingness. In both, it was suggested that the expected increment could depend on current outcome. We show that LI can allow non-monotone missingness and either independent measurement error of unknown variance or dependence of expected increment on current outcome but not both. A popular alternative to LI is a multivariate normal model ignoring the missingness pattern. This gives consistent estimation when data are normally distributed and missing at random (MAR). We clarify the relation between MAR and the assumptions of LI and show that for continuous outcomes multivariate normal estimators are also consistent under (non-MAR and non-normal) assumptions not much stronger than those of LI. Moreover, when missingness is non-monotone, they are typically more efficient.

We consider the problem of parameter estimation for inhomogeneous space-time shot-noise Cox point processes. We explore the possibility of using a stepwise estimation method and dimensionality-reducing techniques to estimate different parts of the model separately.

We discuss the estimation method using projection processes and propose a refined method that avoids projection to the temporal domain. This remedies the main flaw of the method using projection processes – possible overlapping in the projection process of clusters, which are clearly separated in the original space-time process. This issue is more prominent in the temporal projection process where the amount of information lost by projection is higher than in the spatial projection process.

For the refined method, we derive consistency and asymptotic normality results under the increasing domain asymptotics and appropriate moment and mixing assumptions. We also present a simulation study that suggests that cluster overlapping is successfully overcome by the refined method.

Regression discontinuity designs (RD designs) are used as a method for causal inference from observational data, where the decision to apply an intervention is made according to a ‘decision rule’ that is linked to some continuous variable. Such designs are being increasingly developed in medicine. The local average treatment effect (LATE) has been established as an estimator of the intervention effect in an RD design, particularly where a design's ‘decision rule’ is not adhered to strictly. Estimating the variance of the LATE is not necessarily straightforward. We consider three approaches to the estimation of the LATE: two-stage least squares, likelihood-based and a Bayesian approach. We compare these under a variety of simulated RD designs and a real example concerning the prescription of statins based on cardiovascular disease risk score.

Right-censored and length-biased failure time data arise in many fields including cross-sectional prevalent cohort studies, and their analysis has recently attracted a great deal of attention. It is well-known that for regression analysis of failure time data, two commonly used approaches are hazard-based and quantile-based procedures, and most of the existing methods are the hazard-based ones. In this paper, we consider quantile regression analysis of right-censored and length-biased data and present a semiparametric varying-coefficient partially linear model. For estimation of regression parameters, a three-stage procedure that makes use of the inverse probability weighted technique is developed, and the asymptotic properties of the resulting estimators are established. In addition, the approach allows the dependence of the censoring variable on covariates, while most of the existing methods assume the independence between censoring variables and covariates. A simulation study is conducted and suggests that the proposed approach works well in practical situations. Also, an illustrative example is provided.

Simultaneous confidence bands have been shown in the statistical literature as powerful inferential tools in univariate linear regression. While the methodology of simultaneous confidence bands for univariate linear regression has been extensively researched and well developed, no published work seems available for multivariate linear regression. This paper fills this gap by studying one particular simultaneous confidence band for multivariate linear regression. Because of the shape of the band, the word ‘tube’ is more pertinent and so will be used to replace the word ‘band’. It is shown that the construction of the tube is related to the distribution of the largest eigenvalue. A simulation-based method is proposed to compute the 1 − *α* quantile of this eigenvalue. With the computation power of modern computers, the simultaneous confidence tube can be computed fast and accurately. A real-data example is used to illustrate the method, and many potential research problems have been pointed out.

Functional data analysis has become an important area of research because of its ability of handling high-dimensional and complex data structures. However, the development is limited in the context of linear mixed effect models and, in particular, for small area estimation. The linear mixed effect models are the backbone of small area estimation. In this article, we consider area-level data and fit a varying coefficient linear mixed effect model where the varying coefficients are semiparametrically modelled via B-splines. We propose a method of estimating the fixed effect parameters and consider prediction of random effects that can be implemented using a standard software. For measuring prediction uncertainties, we derive an analytical expression for the mean squared errors and propose a method of estimating the mean squared errors. The procedure is illustrated via a real data example, and operating characteristics of the method are judged using finite sample simulation studies.

Subgroup detection has received increasing attention recently in different fields such as clinical trials, public management and market segmentation analysis. In these fields, people often face time-to-event data, which are commonly subject to right censoring. This paper proposes a semiparametric Logistic-Cox mixture model for subgroup analysis when the interested outcome is event time with right censoring. The proposed method mainly consists of a likelihood ratio-based testing procedure for testing the existence of subgroups. The expectation–maximization iteration is applied to improve the testing power, and a model-based bootstrap approach is developed to implement the testing procedure. When there exist subgroups, one can also use the proposed model to estimate the subgroup effect and construct predictive scores for the subgroup membership. The large sample properties of the proposed method are studied. The finite sample performance of the proposed method is assessed by simulation studies. A real data example is also provided for illustration.

Pseudo-values have proven very useful in censored data analysis in complex settings such as multi-state models. It was originally suggested by Andersen *et al.*, Biometrika, 90, 2003, 335 who also suggested to estimate standard errors using classical generalized estimating equation results. These results were studied more formally in Graw *et al.*, Lifetime Data Anal., 15, 2009, 241 that derived some key results based on a second-order von Mises expansion. However, results concerning large sample properties of estimates based on regression models for pseudo-values still seem unclear. In this paper, we study these large sample properties in the simple setting of survival probabilities and show that the estimating function can be written as a U-statistic of second order giving rise to an additional term that does not vanish asymptotically. We further show that previously advocated standard error estimates will typically be too large, although in many practical applications the difference will be of minor importance. We show how to estimate correctly the variability of the estimator. This is further studied in some simulation studies.

Let *X* be lognormal(*μ*,*σ*^{2}) with density *f*(*x*); let *θ* > 0 and define
. We study properties of the exponentially tilted density (Esscher transform) *f*_{θ}(*x*) = e^{−θx}*f*(*x*)/*L*(*θ*), in particular its moments, its asymptotic form as *θ**∞* and asymptotics for the saddlepoint *θ*(*x*) determined by
. The asymptotic formulas involve the Lambert W function. The established relations are used to provide two different numerical methods for evaluating the left tail probability of the sum of lognormals *S*_{n}=*X*_{1}+⋯+*X*_{n}: a saddlepoint approximation and an exponential tilting importance sampling estimator. For the latter, we demonstrate logarithmic efficiency. Numerical examples for the cdf *F*_{n}(*x*) and the pdf *f*_{n}(*x*) of *S*_{n} are given in a range of values of *σ*^{2},*n* and *x* motivated by portfolio value-at-risk calculations.

The extremogram is a useful tool for measuring extremal dependence and checking model adequacy in a time series. We define the extremogram in the spatial domain when the data is observed on a lattice or at locations distributed as a Poisson point process in *d*-dimensional space. We establish a central limit theorem for the empirical spatial extremogram. We show these conditions are applicable for max-moving average processes and Brown–Resnick processes and illustrate the empirical extremogram's performance via simulation. We also demonstrate its practical use with a data set related to rainfall in a region in Florida and ground-level ozone in the eastern United States.

Modern systems of official statistics require the estimation and publication of business statistics for disaggregated domains, for example, industry domains and geographical regions. Outlier robust methods have proven to be useful for small-area estimation. Recently proposed outlier robust model-based small-area methods assume, however, uncorrelated random effects. Spatial dependencies, resulting from similar industry domains or geographic regions, often occur. In this paper, we propose an outlier robust small-area methodology that allows for the presence of spatial correlation in the data. In particular, we present a robust predictive methodology that incorporates the potential spatial impact from other areas (domains) on the small area (domain) of interest. We further propose two parametric bootstrap methods for estimating the mean-squared error. Simulations indicate that the proposed methodology may lead to efficiency gains. The paper concludes with an illustrative application by using business data for estimating average labour costs in Italian provinces.

A common approach taken in high-dimensional regression analysis is sliced inverse regression, which separates the range of the response variable into non-overlapping regions, called ‘slices’. Asymptotic results are usually shown assuming that the slices are fixed, while in practice, estimators are computed with random slices containing the same number of observations. Based on empirical process theory, we present a unified theoretical framework to study these techniques, and revisit popular inverse regression estimators. Furthermore, we introduce a bootstrap methodology that reproduces the laws of Cramér–von Mises test statistics of interest to model dimension, effects of specified covariates and whether or not a sliced inverse regression estimator is appropriate. Finally, we investigate the accuracy of different bootstrap procedures by means of simulations.

In this paper, I explore the usage of positive definite metric tensors derived from the second derivative information in the context of the simplified manifold Metropolis adjusted Langevin algorithm. I propose a new adaptive step size procedure that resolves the shortcomings of such metric tensors in regions where the log-target has near zero curvature in some direction. The adaptive step size selection also appears to alleviate the need for different tuning parameters in transient and stationary regimes that is typical of Metropolis adjusted Langevin algorithm. The combination of metric tensors derived from the second derivative information and the adaptive step size selection constitute a large step towards developing reliable manifold Markov chain Monte Carlo methods that can be implemented automatically for models with unknown or intractable Fisher information, and even for target distributions that do not admit factorization into prior and likelihood. Through examples of low to moderate dimension, I show that the proposed methodology performs very well relative to alternative Markov chain Monte Carlo methods.

The Hartley-Rao-Cochran sampling design is an unequal probability sampling design which can be used to select samples from finite populations. We propose to adjust the empirical likelihood approach for the Hartley-Rao-Cochran sampling design. The approach proposed intrinsically incorporates sampling weights, auxiliary information and allows for large sampling fractions. It can be used to construct confidence intervals. In a simulation study, we show that the coverage may be better for the empirical likelihood confidence interval than for standard confidence intervals based on variance estimates. The approach proposed is simple to implement and less computer intensive than bootstrap. The confidence interval proposed does not rely on re-sampling, linearization, variance estimation, design-effects or joint inclusion probabilities.

In this paper, a penalized weighted least squares approach is proposed for small area estimation under the unit level model. The new method not only unifies the traditional empirical best linear unbiased prediction that does not take sampling design into account and the pseudo-empirical best linear unbiased prediction that incorporates sampling weights but also has the desirable robustness property to model misspecification compared with existing methods. The empirical small area estimator is given, and the corresponding second-order approximation to mean squared error estimator is derived. Numerical comparisons based on synthetic and real data sets show superior performance of the proposed method to currently available estimators in the literature.

Item non-response in surveys occurs when some, but not all, variables are missing. Unadjusted estimators tend to exhibit some bias, called the non-response bias, if the respondents differ from the non-respondents with respect to the study variables. In this paper, we focus on item non-response, which is usually treated by some form of single imputation. We examine the properties of doubly robust imputation procedures, which are those that lead to an estimator that remains consistent if either the outcome variable or the non-response mechanism is adequately modelled. We establish the double robustness property of the imputed estimator of the finite population distribution function under random hot-deck imputation within classes. We also discuss the links between our approach and that of Chambers and Dunstan. The results of a simulation study support our findings.

This paper considers quantile regression for a wide class of time series models including autoregressive and moving average (ARMA) models with asymmetric generalized autoregressive conditional heteroscedasticity errors. The classical mean-variance models are reinterpreted as conditional location-scale models so that the quantile regression method can be naturally geared into the considered models. The consistency and asymptotic normality of the quantile regression estimator is established in location-scale time series models under mild conditions. In the application of this result to ARMA-generalized autoregressive conditional heteroscedasticity models, more primitive conditions are deduced to obtain the asymptotic properties. For illustration, a simulation study and a real data analysis are provided.

Directed acyclic graph (DAG) models—also called Bayesian networks—are widely used in probabilistic reasoning, machine learning and causal inference. If latent variables are present, then the set of possible marginal distributions over the remaining (observed) variables is generally not represented by any DAG. Larger classes of mixed graphical models have been introduced to overcome this; however, as we show, these classes are not sufficiently rich to capture all the marginal models that can arise. We introduce a new class of hyper-graphs, called mDAGs, and a latent projection operation to obtain an mDAG from the margin of a DAG. We show that each distinct marginal of a DAG model is represented by at least one mDAG and provide graphical results towards characterizing equivalence of these models. Finally, we show that mDAGs correctly capture the marginal structure of causally interpreted DAGs under interventions on the observed variables.

A framework for the asymptotic analysis of local power properties of tests of stationarity in time series analysis is developed. Appropriate sequences of locally stationary processes are defined that converge at a controlled rate to a limiting stationary process as the length of the time series increases. Different interesting classes of local alternatives to the null hypothesis of stationarity are then considered, and the local power properties of some recently proposed, frequency domain-based tests for stationarity are investigated. Some simulations illustrate our theoretical findings.

The timing of a time-dependent treatment—for example, when to perform a kidney transplantation—is an important factor for evaluating treatment efficacy. A naïve comparison between the treated and untreated groups, while ignoring the timing of treatment, typically yields biased results that might favour the treated group because only patients who survive long enough will get treated. On the other hand, studying the effect of a time-dependent treatment is often complex, as it involves modelling treatment history and accounting for the possible time-varying nature of the treatment effect. We propose a varying-coefficient Cox model that investigates the efficacy of a time-dependent treatment by utilizing a global partial likelihood, which renders appealing statistical properties, including consistency, asymptotic normality and semiparametric efficiency. Extensive simulations verify the finite sample performance, and we apply the proposed method to study the efficacy of kidney transplantation for end-stage renal disease patients in the US Scientific Registry of Transplant Recipients.

Outlier detection algorithms are intimately connected with robust statistics that down-weight some observations to zero. We define a number of outlier detection algorithms related to the Huber-skip and least trimmed squares estimators, including the one-step Huber-skip estimator and the forward search. Next, we review a recently developed asymptotic theory of these. Finally, we analyse the gauge, the fraction of wrongly detected outliers, for a number of outlier detection algorithms and establish an asymptotic normal and a Poisson theory for the gauge.

It is argued that model selection and robust estimation should be handled jointly. Impulse indicator saturation makes that possible, but leads to the situation where there are more variables than observations. This is illustrated by revisiting the analysis of Tobin's food data.

This article examines the recently proposed sequentially normalized least squares criterion for the linear regression subset selection problem. A simplified formula for computation of the criterion is presented, and an expression for its asymptotic form is derived without the assumption of normally distributed errors. Asymptotic consistency is proved in two senses: (i) in the usual sense, where the sample size tends to infinity, and (ii) in a non-standard sense, where the sample size is fixed and the noise variance tends to zero.

Length-biased sampling data are often encountered in the studies of economics, industrial reliability, epidemiology, genetics and cancer screening. The complication of this type of data is due to the fact that the observed lifetimes suffer from left truncation and right censoring, where the left truncation variable has a uniform distribution. In the Cox proportional hazards model, Huang & Qin (*Journal of the American Statistical Association*, 107, 2012, p. 107) proposed a composite partial likelihood method which not only has the simplicity of the popular partial likelihood estimator, but also can be easily performed by the standard statistical software. The accelerated failure time model has become a useful alternative to the Cox proportional hazards model. In this paper, by using the composite partial likelihood technique, we study this model with length-biased sampling data. The proposed method has a very simple form and is robust when the assumption that the censoring time is independent of the covariate is violated. To ease the difficulty of calculations when solving the non-smooth estimating equation, we use a kernel smoothed estimation method (Heller; *Journal of the American Statistical Association*, 102, 2007, p. 552). Large sample results and a re-sampling method for the variance estimation are discussed. Some simulation studies are conducted to compare the performance of the proposed method with other existing methods. A real data set is used for illustration.

Non-parametric estimation and bootstrap techniques play an important role in many areas of Statistics. In the point process context, kernel intensity estimation has been limited to exploratory analysis because of its inconsistency, and some consistent alternatives have been proposed. Furthermore, most authors have considered kernel intensity estimators with scalar bandwidths, which can be very restrictive. This work focuses on a consistent kernel intensity estimator with unconstrained bandwidth matrix. We propose a smooth bootstrap for inhomogeneous spatial point processes. The consistency of the bootstrap mean integrated squared error (MISE) as an estimator of the MISE of the consistent kernel intensity estimator proves the validity of the resampling procedure. Finally, we propose a plug-in bandwidth selection procedure based on the bootstrap MISE and compare its performance with several methods currently used through both as a simulation study and an application to the spatial pattern of wildfires registered in Galicia (Spain) during 2006.

Likelihood-based inference with missing data is challenging because the observed log likelihood is often an (intractable) integration over the missing data distribution, which also depends on the unknown parameter. Approximating the integral by Monte Carlo sampling does not necessarily lead to a valid likelihood over the entire parameter space because the Monte Carlo samples are generated from a distribution with a fixed parameter value.

We consider approximating the observed log likelihood based on importance sampling. In the proposed method, the dependency of the integral on the parameter is properly reflected through fractional weights. We discuss constructing a confidence interval using the profile likelihood ratio test. A Newton–Raphson algorithm is employed to find the interval end points. Two limited simulation studies show the advantage of the Wilks inference over the Wald inference in terms of power, parameter space conformity and computational efficiency. A real data example on salamander mating shows that our method also works well with high-dimensional missing data.

In the existing statistical literature, the almost default choice for inference on inhomogeneous point processes is the most well-known model class for inhomogeneous point processes: reweighted second-order stationary processes. In particular, the *K*-function related to this type of inhomogeneity is presented as * the* inhomogeneous

In some applications, the failure time of interest is the time from an originating event to a failure event while both event times are interval censored. We propose fitting Cox proportional hazards models to this type of data using a spline-based sieve maximum marginal likelihood, where the time to the originating event is integrated out in the empirical likelihood function of the failure time of interest. This greatly reduces the complexity of the objective function compared with the fully semiparametric likelihood. The dependence of the time of interest on time to the originating event is induced by including the latter as a covariate in the proportional hazards model for the failure time of interest. The use of splines results in a higher rate of convergence of the estimator of the baseline hazard function compared with the usual non-parametric estimator. The computation of the estimator is facilitated by a multiple imputation approach. Asymptotic theory is established and a simulation study is conducted to assess its finite sample performance. It is also applied to analyzing a real data set on AIDS incubation time.

The mean residual life measures the expected remaining life of a subject who has survived up to a particular time. When survival time distribution is highly skewed or heavy tailed, the restricted mean residual life must be considered. In this paper, we propose an additive–multiplicative restricted mean residual life model to study the association between the restricted mean residual life function and potential regression covariates in the presence of right censoring. This model extends the proportional mean residual life model using an additive model as its covariate dependent baseline. For the suggested model, some covariate effects are allowed to be time-varying. To estimate the model parameters, martingale estimating equations are developed, and the large sample properties of the resulting estimators are established. In addition, to assess the adequacy of the model, we investigate a goodness of fit test that is asymptotically justified. The proposed methodology is evaluated via simulation studies and further applied to a kidney cancer data set collected from a clinical trial.

This paper focuses on a situation in which a set of treatments is associated with a response through a set of supplementary variables in linear models as well as discrete models. Under the situation, we demonstrate that the causal effect can be estimated more accurately from the set of supplementary variables. In addition, we show that the set of supplementary variables can include selection variables and proxy variables as well. Furthermore, we propose selection criteria for supplementary variables based on the estimation accuracy of causal effects. From graph structures based on our results, we can judge certain situations under which the causal effect can be estimated more accurately by supplementary variables and reliably evaluate the causal effects from observed data.

We establish the local asymptotic normality property for a class of ergodic parametric jump-diffusion processes with state-dependent intensity and known volatility function sampled at high frequency. We prove that the inference problem about the drift and jump parameters is adaptive with respect to parameters in the volatility function that can be consistently estimated.

In survival analysis, we sometimes encounter data with multiple censored outcomes. Under certain scenarios, partial or even all covariates have ‘similar’ relative risks on the multiple outcomes in the Cox regression analysis. The similarity in covariate effects can be quantified using the proportionality of regression coefficients. Identifying the proportionality structure, or equivalently whether covariates have individual or collective effects, may have important scientific implications. In addition, it can lead to a smaller set of unknown parameters, which in turn results in more accurate estimation. In this article, we develop a novel approach for identifying the proportionality structure. Simulation shows the satisfactory performance of the proposed approach and its advantage over estimation under no assumed structure. We analyse three datasets to demonstrate the practical application of the proposed approach.

We propose a new class of semiparametric estimators for proportional hazards models in the presence of measurement error in the covariates, where the baseline hazard function, the hazard function for the censoring time, and the distribution of the true covariates are considered as unknown infinite dimensional parameters. We estimate the model components by solving estimating equations based on the semiparametric efficient scores under a sequence of restricted models where the logarithm of the hazard functions are approximated by reduced rank regression splines. The proposed estimators are locally efficient in the sense that the estimators are semiparametrically efficient if the distribution of the error-prone covariates is specified correctly and are still consistent and asymptotically normal if the distribution is misspecified. Our simulation studies show that the proposed estimators have smaller biases and variances than competing methods. We further illustrate the new method with a real application in an HIV clinical trial.

The estimation of abundance from presence–absence data is an intriguing problem in applied statistics. The classical Poisson model makes strong independence and homogeneity assumptions and in practice generally underestimates the true abundance. A controversial *ad hoc* method based on negative-binomial counts (Am. Nat.) has been empirically successful but lacks theoretical justification. We first present an alternative estimator of abundance based on a paired negative binomial model that is consistent and asymptotically normally distributed. A quadruple negative binomial extension is also developed, which yields the previous *ad hoc* approach and resolves the controversy in the literature. We examine the performance of the estimators in a simulation study and estimate the abundance of 44 tree species in a permanent forest plot.

We consider a dependent thinning of a regular point process with the aim of obtaining aggregation on the large scale and regularity on the small scale in the resulting target point process of retained points. Various parametric models for the underlying processes are suggested and the properties of the target point process are studied. Simulation and inference procedures are discussed when a realization of the target point process is observed, depending on whether the thinned points are observed or not. The paper extends previous work by Dietrich Stoyan on interrupted point processes.

A problem of using a non-convex penalty for sparse regression is that there are multiple local minima of the penalized sum of squared residuals, and it is not known which one is a good estimator. The aim of this paper is to give a guide to design a non-convex penalty that has the strong oracle property. Here, the strong oracle property means that the oracle estimator is the unique local minimum of the objective function. We summarize three definitions of the oracle property – the global, weak and strong oracle properties. Then, we give sufficient conditions for the weak oracle property, which means that the oracle estimator becomes a local minimum. We give an example of non-convex penalties that possess the weak oracle property but not the strong oracle property. Finally, we give a necessary condition for the strong oracle property.