For small area estimation of area-level data, the Fay–Herriot model is extensively used as a model-based method. In the Fay–Herriot model, it is conventionally assumed that the sampling variances are known, whereas estimators of sampling variances are used in practice. Thus, the settings of knowing sampling variances are unrealistic, and several methods are proposed to overcome this problem. In this paper, we assume the situation where the direct estimators of the sampling variances are available as well as the sample means. Using this information, we propose a Bayesian yet objective method producing shrinkage estimation of both means and variances in the Fay–Herriot model. We consider the hierarchical structure for the sampling variances, and we set uniform prior on model parameters to keep objectivity of the proposed model. For validity of the posterior inference, we show under mild conditions that the posterior distribution is proper and has finite variances. We investigate the numerical performance through simulation and empirical studies.

Spatio-temporal modelling is an increasingly popular topic in Statistics. Our paper contributes to this line of research by developing the theory, simulation and inference for a spatio-temporal Ornstein–Uhlenbeck process. We conduct detailed simulation studies and demonstrate the practical relevance of these processes in an empirical study of radiation anomaly data. Finally, we describe how predictions can be carried out in the Gaussian setting.

Skew-symmetric families of distributions such as the skew-normal and skew-*t* represent supersets of the normal and *t* distributions, and they exhibit richer classes of extremal behaviour. By defining a non-stationary skew-normal process, which allows the easy handling of positive definite, non-stationary covariance functions, we derive a new family of max-stable processes – the extremal skew-*t* process. This process is a superset of non-stationary processes that include the stationary extremal-*t* processes. We provide the spectral representation and the resulting angular densities of the extremal skew-*t* process and illustrate its practical implementation.

Multivariate extreme value statistical analysis is concerned with observations on several variables which are thought to possess some degree of tail dependence. The main approaches to inference for multivariate extremes consist in approximating either the distribution of block component-wise maxima or the distribution of the exceedances over a high threshold. Although the expressions of the asymptotic density functions of these distributions may be characterized, they cannot be computed in general. In this paper, we study the case where the spectral random vector of the multivariate max-stable distribution has known conditional distributions. The asymptotic density functions of the multivariate extreme value distributions may then be written through univariate integrals that are easily computed or simulated. The asymptotic properties of two likelihood estimators are presented, and the utility of the method is examined via simulation.

Data augmentation is required for the implementation of many Markov chain Monte Carlo (MCMC) algorithms. The inclusion of augmented data can often lead to conditional distributions from well-known probability distributions for some of the parameters in the model. In such cases, collapsing (integrating out parameters) has been shown to improve the performance of MCMC algorithms. We show how integrating out the infection rate parameter in epidemic models leads to efficient MCMC algorithms for two very different epidemic scenarios, final outcome data from a multitype SIR epidemic and longitudinal data from a spatial SI epidemic. The resulting MCMC algorithms give fresh insight into real-life epidemic data sets.

In the analysis of semi-competing risks data interest lies in estimation and inference with respect to a so-called non-terminal event, the observation of which is subject to a terminal event. Multi-state models are commonly used to analyse such data, with covariate effects on the transition/intensity functions typically specified via the Cox model and dependence between the non-terminal and terminal events specified, in part, by a unit-specific shared frailty term. To ensure identifiability, the frailties are typically assumed to arise from a parametric distribution, specifically a Gamma distribution with mean 1.0 and variance, say, *σ*^{2}. When the frailty distribution is misspecified, however, the resulting estimator is not guaranteed to be consistent, with the extent of asymptotic bias depending on the discrepancy between the assumed and true frailty distributions. In this paper, we propose a novel class of transformation models for semi-competing risks analysis that permit the non-parametric specification of the frailty distribution. To ensure identifiability, the class restricts to parametric specifications of the transformation and the error distribution; the latter are flexible, however, and cover a broad range of possible specifications. We also derive the semi-parametric efficient score under the complete data setting and propose a non-parametric score imputation method to handle right censoring; consistency and asymptotic normality of the resulting estimators is derived and small-sample operating characteristics evaluated via simulation. Although the proposed semi-parametric transformation model and non-parametric score imputation method are motivated by the analysis of semi-competing risks data, they are broadly applicable to any analysis of multivariate time-to-event outcomes in which a unit-specific shared frailty is used to account for correlation. Finally, the proposed model and estimation procedures are applied to a study of hospital readmission among patients diagnosed with pancreatic cancer.

It is the main purpose of this paper to study the asymptotics of certain variants of the empirical process in the context of survey data. Precisely, Functional Central Limit Theorems are established under usual conditions when the sample is drawn from a Poisson or a rejective sampling design. The framework we develop encompasses sampling designs with non-uniform first order inclusion probabilities, which can be chosen so as to optimize estimation accuracy. Applications to Hadamard differentiable functionals are considered.

Motivated from problems in canonical correlation analysis, reduced rank regression and sufficient dimension reduction, we introduce a double dimension reduction model where a single index of the multivariate response is linked to the multivariate covariate through a single index of these covariates, hence the name double single index model. Because nonlinear association between two sets of multivariate variables can be arbitrarily complex and even intractable in general, we aim at seeking a principal one-dimensional association structure where a response index is fully characterized by a single predictor index. The functional relation between the two single-indices is left unspecified, allowing flexible exploration of any potential nonlinear association. We argue that such double single index association is meaningful and easy to interpret, and the rest of the multi-dimensional dependence structure can be treated as nuisance in model estimation. We investigate the estimation and inference of both indices and the regression function, and derive the asymptotic properties of our procedure. We illustrate the numerical performance in finite samples and demonstrate the usefulness of the modelling and estimation procedure in a multi-covariate multi-response problem concerning concrete.

This paper presents a goodness-of-fit test for parametric regression models with scalar response and directional predictor, that is, a vector on a sphere of arbitrary dimension. The testing procedure is based on the weighted squared distance between a smooth and a parametric regression estimator, where the smooth regression estimator is obtained by a projected local approach. Asymptotic behaviour of the test statistic under the null hypothesis and local alternatives is provided, jointly with a consistent bootstrap algorithm for application in practice. A simulation study illustrates the performance of the test in finite samples. The procedure is applied to test a linear model in text mining.

Focusing on the model selection problems in the family of Poisson mixture models (including the Poisson mixture regression model with random effects and zero-inflated Poisson regression model with random effects), the current paper derives two conditional Akaike information criteria. The criteria are the unbiased estimators of the conditional Akaike information based on the conditional log-likelihood and the conditional Akaike information based on the joint log-likelihood, respectively. The derivation is free from the specific parametric assumptions about the conditional mean of the true data-generating model and applies to different types of estimation methods. Additionally, the derivation is not based on the asymptotic argument. Simulations show that the proposed criteria have promising estimation accuracy. In addition, it is found that the criterion based on the conditional log-likelihood demonstrates good model selection performance under different scenarios. Two sets of real data are used to illustrate the proposed method.

In this paper, we reconsider the mixture vector autoregressive model, which was proposed in the literature for modelling non-linear time series. We complete and extend the stationarity conditions, derive a matrix formula in closed form for the autocovariance function of the process and prove a result on stable vector autoregressive moving-average representations of mixture vector autoregressive models. For these results, we apply techniques related to a Markovian representation of vector autoregressive moving-average processes. Furthermore, we analyse maximum likelihood estimation of model parameters by using the expectation–maximization algorithm and propose a new iterative algorithm for getting the maximum likelihood estimates. Finally, we study the model selection problem and testing procedures. Several examples, simulation experiments and an empirical application based on monthly financial returns illustrate the proposed procedures.

In geostatistics and also in other applications in science and engineering, it is now common to perform updates on Gaussian process models with many thousands or even millions of components. These large-scale inferences involve modelling, representational and computational challenges. We describe a visualization tool for large-scale Gaussian updates, the ‘medal plot’. The medal plot shows the updated uncertainty at each observation location and also summarizes the sharing of information across observations, as a proxy for the sharing of information across the state vector (or latent process). As such, it reflects characteristics of both the observations and the statistical model. We illustrate with an application to assess mass trends in the Antarctic Ice Sheet, for which there are strong constraints from the observations and the physics.

Uniformly most powerful Bayesian tests (UMPBTs) are a new class of Bayesian tests in which null hypotheses are rejected if their Bayes factor exceeds a specified threshold. The alternative hypotheses in UMPBTs are defined to maximize the probability that the null hypothesis is rejected. Here, we generalize the notion of UMPBTs by restricting the class of alternative hypotheses over which this maximization is performed, resulting in restricted most powerful Bayesian tests (RMPBTs). We then derive RMPBTs for linear models by restricting alternative hypotheses to ** g** priors. For linear models, the rejection regions of RMPBTs coincide with those of usual frequentist

The paper proposes a new test for detecting the umbrella pattern under a general non-parametric scheme. The alternative asserts that the umbrella ordering holds while the hypothesis is its complement. The main focus is put on controlling the power function of the test outside the alternative. As a result, the asymptotic error of the first kind of the constructed solution is smaller than or equal to the fixed significance level *α* on the whole set where the umbrella ordering does not hold. Also, under finite sample sizes, this error is controlled to a satisfactory extent. A simulation study shows, among other things, that the new test improves upon the solution widely recommended in the literature of the subject. A routine, written in R, is attached as the Supporting Information file.

For multivariate survival data, we study the generalized method of moments (GMM) approach to estimation and inference based on the marginal additive hazards model. We propose an efficient iterative algorithm using closed-form solutions, which dramatically reduces the computational burden. Asymptotic normality of the proposed estimators is established, and the corresponding variance–covariance matrix can be consistently estimated. Inference procedures are derived based on the asymptotic chi-squared distribution of the GMM objective function. Simulation studies are conducted to empirically examine the finite sample performance of the proposed method, and a real data example from a dental study is used for illustration.

An extended single-index model is considered when responses are missing at random. A three-step estimation procedure is developed to define an estimator for the single-index parameter vector by a joint estimating equation. The proposed estimator is shown to be asymptotically normal. An algorithm for computing this estimator is proposed. This algorithm only involves one-dimensional nonparametric smoothers, thereby avoiding the data sparsity problem caused by high model dimensionality. Some simulation studies are conducted to investigate the finite sample performances of the proposed estimators.

Log-normal linear regression models are popular in many fields of research. Bayesian estimation of the conditional mean of the dependent variable is problematic as many choices of the prior for the variance (on the log-scale) lead to posterior distributions with no finite moments. We propose a generalized inverse Gaussian prior for this variance and derive the conditions on the prior parameters that yield posterior distributions of the conditional mean of the dependent variable with finite moments up to a pre-specified order. The conditions depend on one of the three parameters of the suggested prior; the other two have an influence on inferences for small and medium sample sizes. A second goal of this paper is to discuss how to choose these parameters according to different criteria including the optimization of frequentist properties of posterior means.

Influential units occur frequently in surveys, especially in business surveys that collect economic variables whose distributions are highly skewed. A unit is said to be influential when its inclusion or exclusion from the sample has an important impact on the sampling error of estimates. We extend the concept of conditional bias attached to a unit and propose a robust version of the double expansion estimator, which depends on a tuning constant. We determine the tuning constant that minimizes the maximum estimated conditional bias. Our results can be naturally extended to the case of unit nonresponse, the set of respondents often being viewed as a second-phase sample. A robust version of calibration estimators, based on auxiliary information available at both phases, is also constructed.

We are concerned with a situation in which we would like to test multiple hypotheses with tests whose *p*-values cannot be computed explicitly but can be approximated using Monte Carlo simulation. This scenario occurs widely in practice. We are interested in obtaining the same rejections and non-rejections as the ones obtained if the *p*-values for all hypotheses had been available. The present article introduces a framework for this scenario by providing a generic algorithm for a general multiple testing procedure. We establish conditions that guarantee that the rejections and non-rejections obtained through Monte Carlo simulations are identical to the ones obtained with the *p*-values. Our framework is applicable to a general class of step-up and step-down procedures, which includes many established multiple testing corrections such as the ones of Bonferroni, Holm, Sidak, Hochberg or Benjamini–Hochberg. Moreover, we show how to use our framework to improve algorithms available in the literature in such a way as to yield theoretical guarantees on their results. These modifications can easily be implemented in practice and lead to a particular way of reporting multiple testing results as three sets together with an error bound on their correctness, demonstrated exemplarily using a real biological dataset.

Linear structural equation models, which relate random variables via linear interdependencies and Gaussian noise, are a popular tool for modelling multivariate joint distributions. The models correspond to mixed graphs that include both directed and bidirected edges representing the linear relationships and correlations between noise terms, respectively. A question of interest for these models is that of parameter identifiability, whether or not it is possible to recover edge coefficients from the joint covariance matrix of the random variables. For the problem of determining generic parameter identifiability, we present an algorithm building upon the half-trek criterion. Underlying our new algorithm is the idea that ancestral subsets of vertices in the graph can be used to extend the applicability of a decomposition technique.

Compositional tables – a continuous counterpart to the contingency tables – carry relative information about relationships between row and column factors; thus, for their analysis, only ratios between cells of a table are informative. Consequently, the standard Euclidean geometry should be replaced by the Aitchison geometry on the simplex that enables decomposition of the table into its independent and interactive parts. The aim of the paper is to find interpretable coordinate representation for independent and interaction tables (in sense of balances and odds ratios of cells, respectively), where further statistical processing of compositional tables can be performed. Theoretical results are applied to real-world problems from a health survey and in macroeconomics.

We find the asymptotic distribution of the multi-dimensional multi-scale and kernel estimators for high-frequency financial data with microstructure. Sampling times are allowed to be asynchronous and endogenous. In the process, we show that the classes of multi-scale and kernel estimators for smoothing noise perturbation are asymptotically equivalent in the sense of having the same asymptotic distribution for corresponding kernel and weight functions. The theory leads to multi-dimensional stable central limit theorems and feasible versions. Hence, they allow to draw statistical inference for a broad class of multivariate models, which paves the way to tests and confidence intervals in risk measurement for arbitrary portfolios composed of high-frequently observed assets. As an application, we enhance the approach to construct a test for investigating hypotheses that correlated assets are independent conditional on a common factor.

Linear increments (LI) are used to analyse repeated outcome data with missing values. Previously, two LI methods have been proposed, one allowing non-monotone missingness but not independent measurement error and one allowing independent measurement error but only monotone missingness. In both, it was suggested that the expected increment could depend on current outcome. We show that LI can allow non-monotone missingness and either independent measurement error of unknown variance or dependence of expected increment on current outcome but not both. A popular alternative to LI is a multivariate normal model ignoring the missingness pattern. This gives consistent estimation when data are normally distributed and missing at random (MAR). We clarify the relation between MAR and the assumptions of LI and show that for continuous outcomes multivariate normal estimators are also consistent under (non-MAR and non-normal) assumptions not much stronger than those of LI. Moreover, when missingness is non-monotone, they are typically more efficient.

We consider the problem of parameter estimation for inhomogeneous space-time shot-noise Cox point processes. We explore the possibility of using a stepwise estimation method and dimensionality-reducing techniques to estimate different parts of the model separately.

We discuss the estimation method using projection processes and propose a refined method that avoids projection to the temporal domain. This remedies the main flaw of the method using projection processes – possible overlapping in the projection process of clusters, which are clearly separated in the original space-time process. This issue is more prominent in the temporal projection process where the amount of information lost by projection is higher than in the spatial projection process.

For the refined method, we derive consistency and asymptotic normality results under the increasing domain asymptotics and appropriate moment and mixing assumptions. We also present a simulation study that suggests that cluster overlapping is successfully overcome by the refined method.

Regression discontinuity designs (RD designs) are used as a method for causal inference from observational data, where the decision to apply an intervention is made according to a ‘decision rule’ that is linked to some continuous variable. Such designs are being increasingly developed in medicine. The local average treatment effect (LATE) has been established as an estimator of the intervention effect in an RD design, particularly where a design's ‘decision rule’ is not adhered to strictly. Estimating the variance of the LATE is not necessarily straightforward. We consider three approaches to the estimation of the LATE: two-stage least squares, likelihood-based and a Bayesian approach. We compare these under a variety of simulated RD designs and a real example concerning the prescription of statins based on cardiovascular disease risk score.

Right-censored and length-biased failure time data arise in many fields including cross-sectional prevalent cohort studies, and their analysis has recently attracted a great deal of attention. It is well-known that for regression analysis of failure time data, two commonly used approaches are hazard-based and quantile-based procedures, and most of the existing methods are the hazard-based ones. In this paper, we consider quantile regression analysis of right-censored and length-biased data and present a semiparametric varying-coefficient partially linear model. For estimation of regression parameters, a three-stage procedure that makes use of the inverse probability weighted technique is developed, and the asymptotic properties of the resulting estimators are established. In addition, the approach allows the dependence of the censoring variable on covariates, while most of the existing methods assume the independence between censoring variables and covariates. A simulation study is conducted and suggests that the proposed approach works well in practical situations. Also, an illustrative example is provided.

Directed acyclic graph (DAG) models—also called Bayesian networks—are widely used in probabilistic reasoning, machine learning and causal inference. If latent variables are present, then the set of possible marginal distributions over the remaining (observed) variables is generally not represented by any DAG. Larger classes of mixed graphical models have been introduced to overcome this; however, as we show, these classes are not sufficiently rich to capture all the marginal models that can arise. We introduce a new class of hyper-graphs, called mDAGs, and a latent projection operation to obtain an mDAG from the margin of a DAG. We show that each distinct marginal of a DAG model is represented by at least one mDAG and provide graphical results towards characterizing equivalence of these models. Finally, we show that mDAGs correctly capture the marginal structure of causally interpreted DAGs under interventions on the observed variables.

The timing of a time-dependent treatment—for example, when to perform a kidney transplantation—is an important factor for evaluating treatment efficacy. A naïve comparison between the treated and untreated groups, while ignoring the timing of treatment, typically yields biased results that might favour the treated group because only patients who survive long enough will get treated. On the other hand, studying the effect of a time-dependent treatment is often complex, as it involves modelling treatment history and accounting for the possible time-varying nature of the treatment effect. We propose a varying-coefficient Cox model that investigates the efficacy of a time-dependent treatment by utilizing a global partial likelihood, which renders appealing statistical properties, including consistency, asymptotic normality and semiparametric efficiency. Extensive simulations verify the finite sample performance, and we apply the proposed method to study the efficacy of kidney transplantation for end-stage renal disease patients in the US Scientific Registry of Transplant Recipients.

A framework for the asymptotic analysis of local power properties of tests of stationarity in time series analysis is developed. Appropriate sequences of locally stationary processes are defined that converge at a controlled rate to a limiting stationary process as the length of the time series increases. Different interesting classes of local alternatives to the null hypothesis of stationarity are then considered, and the local power properties of some recently proposed, frequency domain-based tests for stationarity are investigated. Some simulations illustrate our theoretical findings.

Item non-response in surveys occurs when some, but not all, variables are missing. Unadjusted estimators tend to exhibit some bias, called the non-response bias, if the respondents differ from the non-respondents with respect to the study variables. In this paper, we focus on item non-response, which is usually treated by some form of single imputation. We examine the properties of doubly robust imputation procedures, which are those that lead to an estimator that remains consistent if either the outcome variable or the non-response mechanism is adequately modelled. We establish the double robustness property of the imputed estimator of the finite population distribution function under random hot-deck imputation within classes. We also discuss the links between our approach and that of Chambers and Dunstan. The results of a simulation study support our findings.

This paper considers quantile regression for a wide class of time series models including autoregressive and moving average (ARMA) models with asymmetric generalized autoregressive conditional heteroscedasticity errors. The classical mean-variance models are reinterpreted as conditional location-scale models so that the quantile regression method can be naturally geared into the considered models. The consistency and asymptotic normality of the quantile regression estimator is established in location-scale time series models under mild conditions. In the application of this result to ARMA-generalized autoregressive conditional heteroscedasticity models, more primitive conditions are deduced to obtain the asymptotic properties. For illustration, a simulation study and a real data analysis are provided.

The Hartley-Rao-Cochran sampling design is an unequal probability sampling design which can be used to select samples from finite populations. We propose to adjust the empirical likelihood approach for the Hartley-Rao-Cochran sampling design. The approach proposed intrinsically incorporates sampling weights, auxiliary information and allows for large sampling fractions. It can be used to construct confidence intervals. In a simulation study, we show that the coverage may be better for the empirical likelihood confidence interval than for standard confidence intervals based on variance estimates. The approach proposed is simple to implement and less computer intensive than bootstrap. The confidence interval proposed does not rely on re-sampling, linearization, variance estimation, design-effects or joint inclusion probabilities.

In this paper, a penalized weighted least squares approach is proposed for small area estimation under the unit level model. The new method not only unifies the traditional empirical best linear unbiased prediction that does not take sampling design into account and the pseudo-empirical best linear unbiased prediction that incorporates sampling weights but also has the desirable robustness property to model misspecification compared with existing methods. The empirical small area estimator is given, and the corresponding second-order approximation to mean squared error estimator is derived. Numerical comparisons based on synthetic and real data sets show superior performance of the proposed method to currently available estimators in the literature.

The extremogram is a useful tool for measuring extremal dependence and checking model adequacy in a time series. We define the extremogram in the spatial domain when the data is observed on a lattice or at locations distributed as a Poisson point process in *d*-dimensional space. We establish a central limit theorem for the empirical spatial extremogram. We show these conditions are applicable for max-moving average processes and Brown–Resnick processes and illustrate the empirical extremogram's performance via simulation. We also demonstrate its practical use with a data set related to rainfall in a region in Florida and ground-level ozone in the eastern United States.

Let *X* be lognormal(*μ*,*σ*^{2}) with density *f*(*x*); let *θ* > 0 and define
. We study properties of the exponentially tilted density (Esscher transform) *f*_{θ}(*x*) = e^{−θx}*f*(*x*)/*L*(*θ*), in particular its moments, its asymptotic form as *θ**∞* and asymptotics for the saddlepoint *θ*(*x*) determined by
. The asymptotic formulas involve the Lambert W function. The established relations are used to provide two different numerical methods for evaluating the left tail probability of the sum of lognormals *S*_{n}=*X*_{1}+⋯+*X*_{n}: a saddlepoint approximation and an exponential tilting importance sampling estimator. For the latter, we demonstrate logarithmic efficiency. Numerical examples for the cdf *F*_{n}(*x*) and the pdf *f*_{n}(*x*) of *S*_{n} are given in a range of values of *σ*^{2},*n* and *x* motivated by portfolio value-at-risk calculations.

In this paper, I explore the usage of positive definite metric tensors derived from the second derivative information in the context of the simplified manifold Metropolis adjusted Langevin algorithm. I propose a new adaptive step size procedure that resolves the shortcomings of such metric tensors in regions where the log-target has near zero curvature in some direction. The adaptive step size selection also appears to alleviate the need for different tuning parameters in transient and stationary regimes that is typical of Metropolis adjusted Langevin algorithm. The combination of metric tensors derived from the second derivative information and the adaptive step size selection constitute a large step towards developing reliable manifold Markov chain Monte Carlo methods that can be implemented automatically for models with unknown or intractable Fisher information, and even for target distributions that do not admit factorization into prior and likelihood. Through examples of low to moderate dimension, I show that the proposed methodology performs very well relative to alternative Markov chain Monte Carlo methods.

Modern systems of official statistics require the estimation and publication of business statistics for disaggregated domains, for example, industry domains and geographical regions. Outlier robust methods have proven to be useful for small-area estimation. Recently proposed outlier robust model-based small-area methods assume, however, uncorrelated random effects. Spatial dependencies, resulting from similar industry domains or geographic regions, often occur. In this paper, we propose an outlier robust small-area methodology that allows for the presence of spatial correlation in the data. In particular, we present a robust predictive methodology that incorporates the potential spatial impact from other areas (domains) on the small area (domain) of interest. We further propose two parametric bootstrap methods for estimating the mean-squared error. Simulations indicate that the proposed methodology may lead to efficiency gains. The paper concludes with an illustrative application by using business data for estimating average labour costs in Italian provinces.

A common approach taken in high-dimensional regression analysis is sliced inverse regression, which separates the range of the response variable into non-overlapping regions, called ‘slices’. Asymptotic results are usually shown assuming that the slices are fixed, while in practice, estimators are computed with random slices containing the same number of observations. Based on empirical process theory, we present a unified theoretical framework to study these techniques, and revisit popular inverse regression estimators. Furthermore, we introduce a bootstrap methodology that reproduces the laws of Cramér–von Mises test statistics of interest to model dimension, effects of specified covariates and whether or not a sliced inverse regression estimator is appropriate. Finally, we investigate the accuracy of different bootstrap procedures by means of simulations.

Pseudo-values have proven very useful in censored data analysis in complex settings such as multi-state models. It was originally suggested by Andersen *et al.*, Biometrika, 90, 2003, 335 who also suggested to estimate standard errors using classical generalized estimating equation results. These results were studied more formally in Graw *et al.*, Lifetime Data Anal., 15, 2009, 241 that derived some key results based on a second-order von Mises expansion. However, results concerning large sample properties of estimates based on regression models for pseudo-values still seem unclear. In this paper, we study these large sample properties in the simple setting of survival probabilities and show that the estimating function can be written as a U-statistic of second order giving rise to an additional term that does not vanish asymptotically. We further show that previously advocated standard error estimates will typically be too large, although in many practical applications the difference will be of minor importance. We show how to estimate correctly the variability of the estimator. This is further studied in some simulation studies.

Subgroup detection has received increasing attention recently in different fields such as clinical trials, public management and market segmentation analysis. In these fields, people often face time-to-event data, which are commonly subject to right censoring. This paper proposes a semiparametric Logistic-Cox mixture model for subgroup analysis when the interested outcome is event time with right censoring. The proposed method mainly consists of a likelihood ratio-based testing procedure for testing the existence of subgroups. The expectation–maximization iteration is applied to improve the testing power, and a model-based bootstrap approach is developed to implement the testing procedure. When there exist subgroups, one can also use the proposed model to estimate the subgroup effect and construct predictive scores for the subgroup membership. The large sample properties of the proposed method are studied. The finite sample performance of the proposed method is assessed by simulation studies. A real data example is also provided for illustration.

Simultaneous confidence bands have been shown in the statistical literature as powerful inferential tools in univariate linear regression. While the methodology of simultaneous confidence bands for univariate linear regression has been extensively researched and well developed, no published work seems available for multivariate linear regression. This paper fills this gap by studying one particular simultaneous confidence band for multivariate linear regression. Because of the shape of the band, the word ‘tube’ is more pertinent and so will be used to replace the word ‘band’. It is shown that the construction of the tube is related to the distribution of the largest eigenvalue. A simulation-based method is proposed to compute the 1 − *α* quantile of this eigenvalue. With the computation power of modern computers, the simultaneous confidence tube can be computed fast and accurately. A real-data example is used to illustrate the method, and many potential research problems have been pointed out.

Functional data analysis has become an important area of research because of its ability of handling high-dimensional and complex data structures. However, the development is limited in the context of linear mixed effect models and, in particular, for small area estimation. The linear mixed effect models are the backbone of small area estimation. In this article, we consider area-level data and fit a varying coefficient linear mixed effect model where the varying coefficients are semiparametrically modelled via B-splines. We propose a method of estimating the fixed effect parameters and consider prediction of random effects that can be implemented using a standard software. For measuring prediction uncertainties, we derive an analytical expression for the mean squared errors and propose a method of estimating the mean squared errors. The procedure is illustrated via a real data example, and operating characteristics of the method are judged using finite sample simulation studies.

Frame corrections have been studied in census applications for a long time. One very promising method is *dual system estimation*, which is based on capture–recapture models. These methods have been applied recently in the USA, England, Israel and Switzerland. In order to gain information on subgroups of the population, structure preserving estimators can be applied [i.e. structure preserving estimation (SPREE) and generalized SPREE]. The present paper extends the SPREE approach with an alternative distance function, the chi-square. The new method has shown improved estimates in our application with very small domains. A comparative study based on a large-scale Monte Carlo simulation elaborates on advantages and disadvantages of the estimators in the context of the German register-assisted Census 2011.