Sparse high dimensional graphical model selection is a topic of much interest in modern day statistics. A popular approach is to apply *l*_{1}-penalties to either parametric likelihoods, or regularized regression/pseudolikelihoods, with the latter having the distinct advantage that they do not explicitly assume Gaussianity. As none of the popular methods proposed for solving pseudolikelihood-based objective functions have provable convergence guarantees, it is not clear whether corresponding estimators exist or are even computable, or if they actually yield correct partial correlation graphs. We propose a new pseudolikelihood-based graphical model selection method that aims to overcome some of the shortcomings of current methods, but at the same time retain all their respective strengths. In particular, we introduce a novel framework that leads to a convex formulation of the partial covariance regression graph problem, resulting in an objective function comprised of quadratic forms. The objective is then optimized via a co-ordinatewise approach. The specific functional form of the objective function facilitates rigorous convergence analysis leading to convergence guarantees; an important property that cannot be established by using standard results, when the dimension is larger than the sample size, as is often the case in high dimensional applications. These convergence guarantees ensure that estimators are well defined under very general conditions and are always computable. In addition, the approach yields estimators that have good large sample properties and also respect symmetry. Furthermore, application to simulated and real data, timing comparisons and numerical convergence is demonstrated. We also present a novel unifying framework that places all graphical pseudolikelihood methods as special cases of a more general formulation, leading to important insights.

Functional data are traditionally assumed to be observed on the same domain. Motivated by a data set of heart rate temporal profiles, we develop methodology for the analysis of incomplete functional samples where each curve may be observed on a subset of the domain and unobserved elsewhere. We formalize this observation regime and develop the fundamental procedures of functional data analysis for this framework: estimation of parameters (mean and covariance operator) and principal component analysis. Principal scores of a partially observed function cannot be computed directly and we solve this challenging issue by estimating their best predictions as linear functionals of the observed part of the trajectory. Next, we propose a functional completion procedure that recovers the missing part by using the observed part of the curve. We construct prediction intervals for principal scores and bands for missing parts of trajectories. The prediction problems are seen to be ill-posed inverse problems; regularization techniques are used to obtain a stable solution. A simulation study shows the good performance of our methods. We illustrate the methods on the heart rate data and provide practical computational algorithms and theoretical arguments and proofs of all results.

A framework for causal inference from two-level factorial designs is proposed, which uses potential outcomes to define causal effects. The paper explores the effect of non-additivity of unit level treatment effects on Neyman's repeated sampling approach for estimation of causal effects and on Fisher's randomization tests on sharp null hypotheses in these designs. The framework allows for statistical inference from a finite population, permits definition and estimation of estimands other than ‘average factorial effects’ and leads to more flexible inference procedures than those based on ordinary least squares estimation from a linear model.

Dependent phenomena, such as relational, spatial and temporal phenomena, tend to be characterized by local dependence in the sense that units which are close in a well-defined sense are dependent. In contrast with spatial and temporal phenomena, though, relational phenomena tend to lack a natural neighbourhood structure in the sense that it is unknown which units are close and thus dependent. Owing to the challenge of characterizing local dependence and constructing random graph models with local dependence, many conventional exponential family random graph models induce strong dependence and are not amenable to statistical inference. We take first steps to characterize local dependence in random graph models, inspired by the notion of finite neighbourhoods in spatial statistics and *M*-dependence in time series, and we show that local dependence endows random graph models with desirable properties which make them amenable to statistical inference. We show that random graph models with local dependence satisfy a natural domain consistency condition which every model should satisfy, but conventional exponential family random graph models do not satisfy. In addition, we establish a central limit theorem for random graph models with local dependence, which suggests that random graph models with local dependence are amenable to statistical inference. We discuss how random graph models with local dependence can be constructed by exploiting either observed or unobserved neighbourhood structure. In the absence of observed neighbourhood structure, we take a Bayesian view and express the uncertainty about the neighbourhood structure by specifying a prior on a set of suitable neighbourhood structures. We present simulation results and applications to two real world networks with ‘ground truth’.

Fitting regression models for intensity functions of spatial point processes is of great interest in ecological and epidemiological studies of association between spatially referenced events and geographical or environmental covariates. When Cox or cluster process models are used to accommodate clustering that is not accounted for by the available covariates, likelihoodbased inference becomes computationally cumbersome owing to the complicated nature of the likelihood function and the associated score function. It is therefore of interest to consider alternative, more easily computable estimating functions. We derive the optimal estimating function in a class of first-order estimating functions. The optimal estimating function depends on the solution of a certain Fredholm integral equation which in practice is solved numerically. The derivation of the optimal estimating function has close similarities to the derivation of quasi-likelihood for standard data sets. The approximate solution is further equivalent to a quasi-likelihood score for binary spatial data. We therefore use the term quasi-likelihood for our optimal estimating function approach. We demonstrate in a simulation study and a data example that our quasi-likelihood method for spatial point processes is both statistically and computationally efficient.

We consider sparse spatial mixed linear models, particularly those described by Besag and Higdon, and develop an *h*-likelihood method for their statistical inference. The method proposed allows for singular precision matrices, as it produces estimates that coincide with those from the residual maximum likelihood based on appropriate differencing of the data and has a novel approach to estimating precision parameters by a gamma linear model. Furthermore, we generalize the *h*-likelihood method to include continuum spatial variations by making explicit use of scaling limit connections between Gaussian intrinsic Markov random fields on regular arrays and the de Wijs process. Keeping various applications of spatial mixed linear models in mind, we devise a novel sparse conjugate gradient algorithm that allows us to achieve fast matrix-free statistical computations. We provide two applications. The first is an extensive analysis of an agricultural variety trial that brings forward various new aspects of nearest neighbour adjustment such as effects on statistical analyses to changes of scale and use of implicit continuum spatial formulation. The second application concerns an analysis of a large cotton field which gives a focus to matrix-free computations. The paper closes with some further considerations, such as applications to irregularly spaced data, use of the parametric bootstrap and some generalizations to the Gaussian Matérn mixed effect models.

In the absence of relevant prior experience, popular Bayesian estimation techniques usually begin with some form of ‘uninformative’ prior distribution intended to have minimal inferential influence. The Bayes rule will still produce nice looking estimates and credible intervals, but these lack the logical force that is attached to experience-based priors and require further justification. The paper concerns the frequentist assessment of Bayes estimates. A simple formula is shown to give the frequentist standard deviation of a Bayesian point estimate. The same simulations as required for the point estimate also produce the standard deviation. Exponential family models make the calculations particularly simple and bring in a connection to the parametric bootstrap.

Mediation analysis is an important tool in social and medical sciences as it helps to understand why an intervention works. The commonly used approach, given by Baron and Kenny, requires the strong assumption ‘sequential ignorability’ to yield causal interpretation. Ten Have and his colleagues proposed a rank preserving model to relax this assumption. However, the rank preserving model is restricted to the case with binary intervention and single mediator and needs another strong assumption ‘rank preserving’. We propose a new model that can relax this assumption and can handle both multilevel intervention and multicomponent mediators. As an estimating-equation-based method, our model can handle both correlated data with the generalized estimating equation and missing data with inverse probability weighting. Finally, our method can also be used in many other research settings, using mathematical models similar to mediation analysis, such as treatment compliance and post-randomized treatment component analysis. For the causal mediation model proposed, we first show identifiability for the parameters in the model. We then propose a semiparametric method for estimating the model parameters and derive asymptotic results for the estimators proposed. Simulation shows good performance for the proposed estimators in finite sample sizes. Finally, we apply the method proposed to two real world clinical studies: the college student drinking study, and the ‘Improving mood promoting access to collaborative treatment for late life depression’ study.

Time series segmentation, which is also known as multiple-change-point detection, is a well-established problem. However, few solutions have been designed specifically for high dimensional situations. Our interest is in segmenting the second-order structure of a high dimensional time series. In a generic step of a binary segmentation algorithm for multivariate time series, one natural solution is to combine cumulative sum statistics obtained from local periodograms and cross-periodograms of the components of the input time series. However, the standard ‘maximum’ and ‘average’ methods for doing so often fail in high dimensions when, for example, the change points are sparse across the panel or the cumulative sum statistics are spuriously large. We propose the sparsified binary segmentation algorithm which aggregates the cumulative sum statistics by adding only those that pass a certain threshold. This ‘sparsifying’ step reduces the influence of irrelevant noisy contributions, which is particularly beneficial in high dimensions. To show the consistency of sparsified binary segmentation, we introduce the multivariate locally stationary wavelet model for time series, which is a separate contribution of this work.

We consider estimation in a sparse additive regression model with the design points on a regular lattice. We establish the minimax convergence rates over Sobolev classes and propose a Fourier-based rate optimal estimator which is adaptive to the unknown sparsity and smoothness of the response function. The estimator is derived within a Bayesian formalism but can be naturally viewed as a penalized maximum likelihood estimator with the complexity penalties on the number of non-zero univariate additive components of the response and on the numbers of the non-zero coefficients of their Fourer expansions. We compare it with several existing counterparts and perform a short simulation study to demonstrate its performance.

We consider statistical inference for time series linear regression where the response and predictor processes may experience general forms of abrupt and smooth non-stationary behaviours over time. Meanwhile, the regression parameters may be subject to linear inequality constraints. A simple and unified procedure for structural stability checks and parameter inference is proposed. In the case where the regression parameters are constrained, the methodology proposed is shown to be consistent whether or not the true regression parameters are on the boundary of the restricted parameter space via utilizing an asymptotically invariant geometric property of polyhedral cones.

We address the problem of dimension reduction for time series of functional data . Such *functional time series* frequently arise, for example, when a continuous time process is segmented into some smaller natural units, such as days. Then each *X*_{t} represents one intraday curve. We argue that functional principal component analysis, though a key technique in the field and a benchmark for any competitor, does not provide an adequate dimension reduction in a time series setting. Functional principal component analysis indeed is a *static* procedure which ignores the essential information that is provided by the serial dependence structure of the functional data under study. Therefore, inspired by Brillinger's theory of *dynamic principal components*, we propose a *dynamic* version of functional principal component analysis which is based on a frequency domain approach. By means of a simulation study and an empirical illustration, we show the considerable improvement that the dynamic approach entails when compared with the usual static procedure.

We consider estimation of the causal effect of a binary treatment on an outcome, conditionally on covariates, from observational studies or natural experiments in which there is a binary instrument for treatment. We describe a doubly robust, locally efficient estimator of the parameters indexing a model for the local average treatment effect conditionally on covariates **V** when randomization of the instrument is only true conditionally on a high dimensional vector of covariates **X**, possibly bigger than **V**. We discuss the surprising result that inference is identical to inference for the parameters of a model for an additive treatment effect on the treated conditionally on **V** that assumes no treatment–instrument interaction. We illustrate our methods with the estimation of the local average effect of participating in 401(k) retirement programmes on savings by using data from the US Census Bureau's 1991 Survey of Income and Program Participation.

In many applications we have both observational and (randomized) interventional data. We propose a Gaussian likelihood framework for joint modelling of such different data types, based on global parameters consisting of a directed acyclic graph and corresponding edge weights and error variances. Thanks to the global nature of the parameters, maximum likelihood estimation is reasonable with only one or few data points per intervention. We prove consistency of the Bayesian information criterion for estimating the interventional Markov equivalence class of directed acyclic graphs which is smaller than the observational analogue owing to increased partial identifiability from interventional data. Such an improvement in identifiability has immediate implications for tighter bounds for inferring causal effects. Besides methodology and theoretical derivations, we present empirical results from real and simulated data.

Estimation of extreme value parameters from observations in the max-domain of attraction of a multivariate max-stable distribution commonly uses aggregated data such as block maxima. Multivariate peaks-over-threshold methods, in contrast, exploit additional information from the non-aggregated ‘large’ observations. We introduce an approach based on peaks over thresholds that provides several new estimators for processes *η* in the max-domain of attraction of the frequently used Hüsler–Reiss model and its spatial extension: Brown–Resnick processes. The method relies on increments *η*(·)−*η*(*t*_{0}) conditional on *η*(*t*_{0}) exceeding a high threshold, where *t*_{0} is a fixed location. When the marginals are standardized to the Gumbel distribution, these increments asymptotically form a Gaussian process resulting in computationally simple estimates of the Hüsler–Reiss parameter matrix and particularly enables parametric inference for Brown–Resnick processes based on (high dimensional) multivariate densities. This is a major advantage over composite likelihood methods that are commonly used in spatial extreme value statistics since they rely only on bivariate densities. A simulation study compares the performance of the new estimators with other commonly used methods. As an application, we fit a non-isotropic Brown–Resnick process to the extremes of 12-year data of daily wind speed measurements.

We consider causal inference in randomized survival studies with right-censored outcomes and all-or-nothing compliance, using semiparametric transformation models to estimate the distribution of survival times in treatment and control groups, conditionally on covariates and latent compliance type. Estimands depending on these distributions, e.g. the complier average causal effect, the complier effect on survival beyond time *t* and the complier quantile effect, are then considered. Maximum likelihood is used to estimate the parameters of the transformation models, using a specially designed expectation–maximization algorithm to overcome the computational difficulties that are created by the mixture structure of the problem and the infinite dimensional parameter in the transformation models. The estimators are shown to be consistent, asymptotically normal and semiparametrically efficient. Inferential procedures for the causal parameters are developed. A simulation study is conducted to evaluate the finite sample performance of the estimated causal parameters. We also apply our methodology to a randomized study conducted by the Health Insurance Plan of Greater New York to assess the reduction in breast cancer mortality due to screening.

Denote the loss return on the equity of a financial institution as *X* and that of the entire market as *Y*. For a given very small value of *p*>0, the marginal expected shortfall (MES) is defined as , where *Q*_{Y}(1−*p*) is the (1−*p*)th quantile of the distribution of *Y*. The MES is an important factor when measuring the systemic risk of financial institutions. For a wide non-parametric class of bivariate distributions, we construct an estimator of the MES and establish the asymptotic normality of the estimator when *p*0, as the sample size *n*∞. Since we are in particular interested in the case *p*=*O*(1/*n*), we use extreme value techniques for deriving the estimator and its asymptotic behaviour. The finite sample performance of the estimator and the relevance of the limit theorem are shown in a detailed simulation study. We also apply our method to estimate the MES of three large US investment banks.

The inferential model (IM) framework provides valid prior-free probabilistic inference by focusing on predicting unobserved auxiliary variables. But, efficient IM-based inference can be challenging when the auxiliary variable is of higher dimension than the parameter. Here we show that features of the auxiliary variable are often fully observed and, in such cases, a simultaneous dimension reduction and information aggregation can be achieved by conditioning. This proposed conditioning strategy leads to efficient IM inference and casts new light on Fisher's notions of sufficiency, conditioning and also Bayesian inference. A differential-equation-driven selection of a conditional association is developed, and validity of the conditional IM is proved under some conditions. For problems that do not admit a conditional IM of the standard form, we propose a more flexible class of conditional IMs based on localization. Examples of local conditional IMs in a bivariate normal model and a normal variance components model are also given.

In general factorial designs where no homoscedasticity or a particular error distribution is assumed, the well-known Wald-type statistic is a simple asymptotically valid procedure. However, it is well known that it suffers from a poor finite sample approximation since the convergence to its *χ*^{2} limit distribution is quite slow. This becomes even worse with an increasing number of factor levels. The aim of the paper is to improve the small sample behaviour of the Wald-type statistic, maintaining its applicability to general settings as crossed or hierarchically nested designs by applying a modified permutation approach. In particular, it is shown that this approach approximates the null distribution of the Wald-type statistic not only under the null hypothesis but also under the alternative yielding an asymptotically valid permutation test which is even finitely exact under exchangeability. Finally, its small sample behaviour is compared with competing procedures in an extensive simulation study.

We propose a non-parametric method to bootstrap locally stationary processes which combines a time domain wild bootstrap approach with a non-parametric frequency domain approach. The method generates pseudotime series which mimic (asymptotically) correct, the local second- and to the necessary extent the fourth-order moment structure of the underlying process. Thus it can be applied to approximate the distribution of several statistics that are based on observations of the locally stationary process. We prove a bootstrap central limit theorem for a general class of statistics that can be expressed as functionals of the preperiodogram, the latter being a useful tool for inferring properties of locally stationary processes. Some simulations and a real data example shed light on the finite sample properties and illustrate the ability of the bootstrap method proposed.

Errors-in-variables regression is important in many areas of science and social science, e.g. in economics where it is often a feature of hedonic models, in environmental science where air quality indices are measured with error, in biology where the vegetative mass of plants is frequently obscured by mismeasurement and in nutrition where reported fat intake is typically subject to substantial error. To date, in non-parametric contexts, the great majority of work has focused on methods for estimating the mean as a function, with relatively little attention being paid to techniques for empirical assessment of the accuracy of the estimator. We develop methodologies for constructing confidence bands. Our contributions include techniques for tuning parameter choice aimed at minimizing the coverage error of confidence bands.

In longitudinal studies, it is of fundamental importance to understand the dynamics in the mean function, variance function and correlations of the repeated or clustered measurements. For modelling the covariance structure, Cholesky-type decomposition-based approaches have been demonstrated to be effective. However, parsimonious approaches for directly revealing the correlation structure between longitudinal measurements remain less well explored, and existing joint modelling approaches may encounter difficulty in interpreting the covariation structure. We propose a novel joint mean–variance correlation modelling approach for longitudinal studies. By applying hyperspherical co-ordinates, we obtain an unconstrained parameterization for the correlation matrix that automatically guarantees its positive definiteness, and we develop a regression approach to model the correlation matrix of the longitudinal measurements by exploiting the parameterization. The modelling framework proposed is parsimonious, interpretable and flexible for analysing longitudinal data. Extensive data examples and simulations support the effectiveness of the approach proposed.

We consider heteroscedastic regression models where the mean function is a partially linear single-index model and the variance function depends on a generalized partially linear single-index model. We do not insist that the variance function depends only on the mean function, as happens in the classical generalized partially linear single-index model. We develop efficient and practical estimation methods for the variance function and for the mean function. Asymptotic theory for the parametric and non-parametric parts of the model is developed. Simulations illustrate the results. An empirical example involving ozone levels is used to illustrate the results further and is shown to be a case where the variance function does not depend on the mean function.

The paper develops a unified theoretical and computational framework for false discovery control in multiple testing of spatial signals. We consider both pointwise and clusterwise spatial analyses, and derive oracle procedures which optimally control the false discovery rate, false discovery exceedance and false cluster rate. A data-driven finite approximation strategy is developed to mimic the oracle procedures on a continuous spatial domain. Our multiple-testing procedures are asymptotically valid and can be effectively implemented using Bayesian computational algorithms for analysis of large spatial data sets. Numerical results show that the procedures proposed lead to more accurate error control and better power performance than conventional methods. We demonstrate our methods for analysing the time trends in tropospheric ozone in eastern USA.

Random effects or shared parameter models are commonly advocated for the analysis of combined repeated measurement and event history data, including dropout from longitudinal trials. Their use in practical applications has generally been limited by computational cost and complexity, meaning that only simple special cases can be fitted by using readily available software. We propose a new approach that exploits recent distributional results for the extended skew normal family to allow exact likelihood inference for a flexible class of random-effects models. The method uses a discretization of the timescale for the time-to-event outcome, which is often unavoidable in any case when events correspond to dropout. We place no restriction on the times at which repeated measurements are made. An analysis of repeated lung function measurements in a cystic fibrosis cohort is used to illustrate the method.

We study quantile regression when the response is an event time subject to potentially dependent censoring. We consider the semicompeting risks setting, where the time to censoring remains observable after the occurrence of the event of interest. Although such a scenario frequently arises in biomedical studies, most of current quantile regression methods for censored data are not applicable because they generally require the censoring time and the event time to be independent. By imposing quite mild assumptions on the association structure between the time-to-event response and the censoring time variable, we propose quantile regression procedures, which allow us to garner a comprehensive view of the covariate effects on the event time outcome as well as to examine the informativeness of censoring. An efficient and stable algorithm is provided for implementing the new method. We establish the asymptotic properties of the resulting estimators including uniform consistency and weak convergence. The theoretical development may serve as a useful template for addressing estimating settings that involve stochastic integrals. Extensive simulation studies suggest that the method proposed performs well with moderate sample sizes. We illustrate the practical utility of our proposals through an application to a bone marrow transplant trial.

Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of such a parameter but will have real information about functionals of the parameter, such as the population mean or variance. The paper proposes a new framework for non-parametric Bayes inference in which the prior distribution for a possibly infinite dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a non-parametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard non-parametric prior distributions in common use and inherit the large support of the standard priors on which they are based. Additionally, posterior approximations under these informative priors can generally be made via minor adjustments to existing Markov chain approximation algorithms for standard non-parametric prior distributions. We illustrate the use of such priors in the context of multivariate density estimation using Dirichlet process mixture models, and in the modelling of high dimensional sparse contingency tables.

Increasingly larger data sets of processes in space and time ask for statistical models and methods that can cope with such data. We show that the solution of a stochastic advection–diffusion partial differential equation provides a flexible model class for spatiotemporal processes which is computationally feasible also for large data sets. The Gaussian process defined through the stochastic partial differential equation has, in general, a non-separable covariance structure. Its parameters can be physically interpreted as explicitly modelling phenomena such as transport and diffusion that occur in many natural processes in diverse fields ranging from environmental sciences to ecology. To obtain computationally efficient statistical algorithms, we use spectral methods to solve the stochastic partial differential equation. This has the advantage that approximation errors do not accumulate over time, and that in the spectral space the computational cost grows linearly with the dimension, the total computational cost of Bayesian or frequentist inference being dominated by the fast Fourier transform. The model proposed is applied to post-processing of precipitation forecasts from a numerical weather prediction model for northern Switzerland. In contrast with the raw forecasts from the numerical model, the post-processed forecasts are calibrated and quantify prediction uncertainty. Moreover, they outperform the raw forecasts, in the sense that they have a lower mean absolute error.

In several areas of application ranging from brain imaging to astrophysics and geostatistics, an important statistical problem is to find regions where the process studied exceeds a certain level. Estimating such regions so that the probability for exceeding the level in the entire set is equal to some predefined value is a difficult problem connected to the problem of multiple significance testing. In this work, a method for solving this problem, as well as the related problem of finding credible regions for contour curves, for latent Gaussian models is proposed. The method is based on using a parametric family for the excursion sets in combination with a sequential importance sampling method for estimating joint probabilities. The accuracy of the method is investigated by using simulated data and an environmental application is presented.

The choice of the summary statistics that are used in Bayesian inference and in particular in approximate Bayesian computation algorithms has bearings on the validation of the resulting inference. Those statistics are nonetheless customarily used in approximate Bayesian computation algorithms without consistency checks. We derive necessary and sufficient conditions on summary statistics for the corresponding Bayes factor to be convergent, namely to select the true model asymptotically. Those conditions, which amount to the expectations of the summary statistics differing asymptotically under the two models, are quite natural and can be exploited in approximate Bayesian computation settings to infer whether or not a choice of summary statistics is appropriate, via a Monte Carlo validation.

The paper focuses primarily on temperature extremes measured at 24 European stations with at least 90 years of data. Here, the term extremes refers to rare excesses of daily maxima and minima. As mean temperatures in this region have been warming over the last century, it is automatic that this positive shift can be detected also in extremes. After removing this warming trend, we focus on the question of determining whether other changes are still detectable in such extreme events. As we do not want to hypothesize any parametric form of such possible changes, we propose a new non-parametric estimator based on the Kullback–Leibler divergence tailored for extreme events. The properties of our estimator are studied theoretically and tested with a simulation study. Our approach is also applied to seasonal extremes of daily maxima and minima for our 24 selected stations.

We investigate the estimation efficiency of the central mean subspace in the framework of sufficient dimension reduction. We derive the semiparametric efficient score and study its practical applicability. Despite the difficulty caused by the potential high dimension issue in the variance component, we show that locally efficient estimators can be constructed in practice. We conduct simulation studies and a real data analysis to demonstrate the finite sample performance and gain in efficiency of the proposed estimators in comparison with several existing methods.

The emergence of the recent financial crisis, during which markets frequently underwent changes in their statistical structure over a short period of time, illustrates the importance of non-stationary modelling in financial time series. Motivated by this observation, we propose a fast, well performing and theoretically tractable method for detecting multiple change points in the structure of an auto-regressive conditional heteroscedastic model for financial returns with piecewise constant parameter values. Our method, termed BASTA (binary segmentation for transformed auto-regressive conditional heteroscedasticity), proceeds in two stages: process transformation and binary segmentation. The process transformation decorrelates the original process and lightens its tails; the binary segmentation consistently estimates the change points. We propose and justify two particular transformations and use simulation to fine-tune their parameters as well as the threshold parameter for the binary segmentation stage. A comparative simulation study illustrates good performance in comparison with the state of the art, and the analysis of the Financial Times Stock Exchange FTSE 100 index reveals an interesting correspondence between the estimated change points and major events of the recent financial crisis. Although the method is easy to implement, ready-made R software is provided.

Monotonic transformations are widely employed in statistics and data analysis. In computer experiments they are often used to gain accuracy in the estimation of global sensitivity statistics. However, one faces the question of interpreting results that are obtained on the transformed data back on the original data. The situation is even more complex in computer experiments, because transformations alter the model input–output mapping and distort the estimators. This work demonstrates that the problem can be solved by utilizing statistics which are monotonic transformation invariant. To do so, we offer an investigation into the families of metrics either based on densities or on cumulative distribution functions that are monotonic transformation invariant and we introduce a new generalized family of metrics. Numerical experiments show that transformations allow numerical convergence in the estimates of global sensitivity statistics, both invariant and not, in cases in which it would otherwise be impossible to obtain convergence. However, one fully exploits the increased numerical accuracy if the global sensitivity statistic is monotonic transformation invariant. Conversely, estimators of measures that do not have this invariance property might lead to misleading deductions.

We introduce a new method for improving the coverage accuracy of confidence intervals for means of lattice distributions. The technique can be applied very generally to enhance existing approaches, although we consider it in greatest detail in the context of estimating a binomial proportion or a Poisson mean, where it is particularly effective. The method is motivated by a simple theoretical result, which shows that, by splitting the original sample of size *n* into two parts, of sizes and , and basing the confidence procedure on the average of the means of these two subsamples, the highly oscillatory behaviour of coverage error, as a function of *n*, is largely removed. Perhaps surprisingly, this approach does not increase confidence interval width; usually the width is slightly reduced. Contrary to what might be expected, our new method performs well when it is used to modify confidence intervals based on existing techniques that already perform very well—it typically improves significantly their coverage accuracy. Each application of the split sample method to an existing confidence interval procedure results in a new technique.