We update a previous approach to the estimation of the size of an open population when there are multiple lists at each time point. Our motivation is 35 years of longitudinal data on the detection of drug users by the Central Registry of Drug Abuse in Hong Kong. We develop a two-stage smoothing spline approach. This gives a flexible and easily implemented alternative to the previous method which was based on kernel smoothing. The new method retains the property of reducing the variability of the individual estimates at each time point. We evaluate the new method by means of a simulation study that includes an examination of the effects of variable selection. The new method is then applied to data collected by the Central Registry of Drug Abuse. The parameter estimates obtained are compared with the well known Jolly–Seber estimates based on single capture methods.

Remote sensing of the earth with satellites yields datasets that can be massive in size, nonstationary in space, and non-Gaussian in distribution. To overcome computational challenges, we use the reduced-rank spatial random effects (SRE) model in a statistical analysis of cloud-mask data from NASA's Moderate Resolution Imaging Spectroradiometer (MODIS) instrument on board NASA's Terra satellite. Parameterisations of cloud processes are the biggest source of uncertainty and sensitivity in different climate models’ future projections of Earth's climate. An accurate quantification of the spatial distribution of clouds, as well as a rigorously estimated pixel-scale clear-sky-probability process, is needed to establish reliable estimates of cloud-distributional changes and trends caused by climate change. Here we give a hierarchical spatial-statistical modelling approach for a very large spatial dataset of 2.75 million pixels, corresponding to a granule of MODIS cloud-mask data, and we use spatial change-of-Support relationships to estimate cloud fraction at coarser resolutions. Our model is non-Gaussian; it postulates a hidden process for the clear-sky probability that makes use of the SRE model, EM-estimation, and optimal (empirical Bayes) spatial prediction of the clear-sky-probability process. Measures of prediction uncertainty are also given.

We describe a class of random field models for geostatistical count data based on Gaussian copulas. Unlike hierarchical Poisson models often used to describe this type of data, Gaussian copula models allow a more direct modelling of the marginal distributions and association structure of the count data. We study in detail the correlation structure of these random fields when the family of marginal distributions is either negative binomial or zero-inflated Poisson; these represent two types of overdispersion often encountered in geostatistical count data. We also contrast the correlation structure of one of these Gaussian copula models with that of a hierarchical Poisson model having the same family of marginal distributions, and show that the former is more flexible than the latter in terms of range of feasible correlation, sensitivity to the mean function and modelling of isotropy. An exploratory analysis of a dataset of Japanese beetle larvae counts illustrate some of the findings. All of these investigations show that Gaussian copula models are useful alternatives to hierarchical Poisson models, specially for geostatistical count data that display substantial correlation and small overdispersion.

This paper provides an information theoretic analysis of the signal identification problem in singular spectrum analysis. We present a signal-plus-noise model based on the Karhunen-Loève expansion and use this model to motivate the construction of a minimum description length criterion that can be employed to identify the dimension (rank) of the signal component. We show that under very general regularity conditions the criterion will identify the true signal dimension with probability one as the sample size increases. A by-product of this analysis is a procedure for selecting a window length consistent with the Whitney embedding theorem. The upshot is a modeling strategy that results in a specification that yields a signal-noise reconstruction that minimises mean squared reconstruction error. Empirical results obtained using simulated and real world data series indicate that theoretical properties presented in the paper are reflected in observed behaviour, even in relatively small samples, and that the minimum description length modeling strategy provides the practitioner with an effective addition to the SSA tool box.

Quadratic forms capture multivariate information in a single number, making them useful, for example, in hypothesis testing. When a quadratic form is large and hence interesting, it might be informative to partition the quadratic form into contributions of individual variables. In this paper it is argued that meaningful partitions can be formed, though the precise partition that is determined will depend on the criterion used to select it. An intuitively reasonable criterion is proposed and the partition to which it leads is determined. The partition is based on a transformation that maximises the sum of the correlations between individual variables and the variables to which they transform under a constraint. Properties of the partition, including optimality properties, are examined. The contributions of individual variables to a quadratic form are less clear-cut when variables are collinear, and forming new variables through rotation can lead to greater transparency. The transformation is adapted so that it has an invariance property under such rotation, whereby the assessed contributions are unchanged for variables that the rotation does not affect directly. Application of the partition to Hotelling's one- and two-sample test statistics, Mahalanobis distance and discriminant analysis is described and illustrated through examples. It is shown that bootstrap confidence intervals for the contributions of individual variables to a partition are readily obtained.

One of the standard variable selection procedures in multiple linear regression is to use a penalisation technique in least-squares (LS) analysis. In this setting, many different types of penalties have been introduced to achieve variable selection. It is well known that LS analysis is sensitive to outliers, and consequently outliers can present serious problems for the classical variable selection procedures. Since rank-based procedures have desirable robustness properties compared to LS procedures, we propose a rank-based adaptive lasso-type penalised regression estimator and a corresponding variable selection procedure for linear regression models. The proposed estimator and variable selection procedure are robust against outliers in both response and predictor space. Furthermore, since rank regression can yield unstable estimators in the presence of multicollinearity, in order to provide inference that is robust against multicollinearity, we adjust the penalty term in the adaptive lasso function by incorporating the standard errors of the rank estimator. The theoretical properties of the proposed procedures are established and their performances are investigated by means of simulations. Finally, the estimator and variable selection procedure are applied to the Plasma Beta-Carotene Level data set.