Statistics Cluster Virtual Issue

FREE - Classic Journal Content 
 Celebrate the International Year of Statistics
Statistics Cluster Virtual Issue Classic Journal Content
 Statistics Journals
To celebrate the International Year of Statistics we have created a special Virtual Issue of 20 classic papers from across our journal portfolio. You can read all these articles FREE during 2013.
Applied Stochastic Models in Business and Industry 

Computational methods for discrete hidden semi-Markov chains
Yann Guédon  

From the abstract: We propose a computational approach for implementing discrete hidden semi-Markov chains. A discrete hidden semi-Markov chain is composed of a non-observable or hidden process which is a finite semi-Markov chain and a discrete observable process. Hidden semi-Markov chains possess both the flexibility of hidden Markov chains for approximating complex probability distributions and the flexibility of semi-Markov chains for representing temporal structures.

Australian and New Zealand Journal of Statistics  

Practical maximum pseudolikelihood for spatial point patterns 
Adrian Baddeley, Rolf Turner

From the abstract: This paper describes a technique for computing approximate maximum pseudolikelihood estimates of the parameters of a spatial point process. The method is an extension of Berman & Turner's (1992) device for maximizing the likelihoods of inhomogeneous spatial Poisson processes. For a very wide class of spatial point process models the likelihood is intractable, while the pseudolikelihood is known explicitly, except for the computation of an integral over the sampling region.

Biometrical Journal  

Some Methods of Propensity-Score Matching had Superior Performance to Others: Results of an Empirical Investigation and Monte Carlo simulations
Peter C. Austin  

From the abstract: Propensity-score matching is increasingly being used to reduce the impact of treatment-selection bias when estimating causal treatment effects using observational data. Several propensity-score matching methods are currently employed in the medical literature: matching on the logit of the propensity score using calipers of width either 0.2 or 0.6 of the standard deviation of the logit of the propensity score; matching on the propensity score using calipers of 0.005, 0.01, 0.02, 0.03, and 0.1; and 5 [RIGHTWARDS ARROW] 1 digit matching on the propensity score.


PICS: Probabilistic Inference for ChIP-seq
Xuekui Zhang, Gordon Robertson, et al.

From the abstract: Summary ChIP-seq combines chromatin immunoprecipitation with massively parallel short-read sequencing. While it can profile genome-wide in vivo transcription factor-DNA association with higher sensitivity, specificity, and spatial resolution than ChIP-chip, it poses new challenges for statistical analysis that derive from the complexity of the biological systems characterized and from variability and biases in its sequence data. We propose a method called PICS (Probabilistic Inference for ChIP-seq) for identifying regions bound by transcription factors from aligned reads.

Canadian Journal of Statistics  

Beyond Kappa: A Review of Interrater Agreeement Measures
Mousumi Banerjee, Michelle Capozzoli, et al.

From the abstract: In 1960, Cohen introduced the kappa coefficient to measure chance-corrected nominal scale agreement between two raters. Since then, numerous extensions and generalizations of this interrater agreement measure have been proposed in the literature. This paper reviews and critiques various approaches to the study of interrater agreement, for which the relevant data comprise either nominal or ordinal categorical ratings from multiple raters.


Large scale wildlife monitoring studies: statistical methods for design and analysis
Kenneth H. Pollock, James D. Nichols et al.

From the abstract: Techniques for estimation of absolute abundance of wildlife populations have received a lot of attention in recent years. The statistical research has been focused on intensive small-scale studies. Recently, however, wildlife biologists have desired to study populations of animals at very large scales for monitoring purposes. Population indices are widely used in these extensive monitoring programs because they are inexpensive compared to estimates of absolute abundance.

International Statistical Review  

Econometric Causality
James J. Heckman  

From the abstract:
This paper presents the econometric approach to causal modelling. It is motivated by policy problems. New causal parameters are defined and identified to address specific policy problems. Economists embrace a scientific approach to causality and model the preferences and choices of agents to infer subjective (agent) evaluations as well as objective outcomes. Anticipated and realized subjective and objective outcomes are distinguished. Models for simultaneous causality are developed. The paper contrasts the Neyman–Rubin model of causality with the econometric approach.

Journal of Chemometrics  

Orthogonal projections to latent structures 
Johan Trygg, Svante Wold

From the abstract: A generic preprocessing method for multivariate data, called orthogonal projections to latent structures (O-PLS), is described. O-PLS removes variation from X (descriptor variables) that is not correlated to Y (property variables, e.g. yield, cost or toxicity). In mathematical terms this is equivalent to removing systematic variation in X that is orthogonal to Y. In an earlier paper, Wold et al. (Chemometrics Intell. Lab. Syst. 1998; 44: 175–185) described orthogonal signal correction (OSC). In this paper a method with the same objective but with different means is described.

Journal of Time Series Analysis  

Break Detection for a Class of Nonlinear Time Series Models 
Richard A. Davis, Thomas C. M. Lee et al.

From the abstract: This article considers the problem of detecting break points for a nonstationary time series. Specifically, the time series is assumed to follow a parametric nonlinear time-series model in which the parameters may change values at fixed times. In this formulation, the number and locations of the break points are assumed unknown. The minimum description length (MDL) is used as a criterion for estimating the number of break points, the locations of break points and the parametric model in each segment. The best segmentation found by minimizing MDL is obtained using a genetic algorithm.


An exact algorithm for the elementary shortest path problem with resource constraints: Application to some vehicle routing problems
ominique Feillet, Pierre Dejax et al. 

From the abstract: In this article, we propose a solution procedure for the Elementary Shortest Path Problem with Resource Constraints (ESPPRC). A relaxed version of this problem in which the path does not have to be elementary has been the backbone of a number of solution procedures based on column generation for several important problems, such as vehicle routing and crew pairing. In many cases relaxing the restriction of an elementary path resulted in optimal solutions in a reasonable computation.

Naval Research Logistics  

The importance of decoupling recurrent and disruption risks in a supply chain  
Sunil Chopra, Gilles Reinhardt et al.

From the abstract: This paper focuses on the importance of decoupling recurrent supply risk and disruption risk when planning appropriate mitigation strategies. We show that bundling the two uncertainties leads a manager to underutilize a reliable source while over utilizing a cheaper but less reliable supplier. As in Dada et al. (working paper, University of Illinois, Champaign, IL, 2003), we show that increasing quantity from a cheaper but less reliable source is an effective risk mitigation strategy if most of the supply risk growth comes from an increase in recurrent uncertainty.

Pharmaceutical Statistics  

Reporting cumulative proportion of subjects with an adverse event based on data from multiple studies 
Christy Chuang-Stein, Mohan Beltangady

From the abstract: Experience has shown us that when data are pooled from multiple studies to create an integrated summary, an analysis based on naïvely-pooled data is vulnerable to the mischief of Simpson's Paradox. Using the proportions of patients with a target adverse event (AE) as an example, we demonstrate the Paradox's effect on both the comparison and the estimation of the proportions. While meta analytic approaches have been recommended and increasingly used for comparing safety data between treatments, reporting proportions of subjects experiencing a target AE based on data from multiple studies has received little attention.

Quality and Reliability Engineering International  

An explanation and critique of taguchi's contributions to quality engineering
George Box, Søren Bisgaard et al.

From the abstract: Recently there has been much interest and some controversy concerning the statistical methods employed by Professor Genichi Taguchi of Japan for improving the quality of products and processes. These methods include the use of fractional factorial designs and other orthogonal arrays, parameter design to minimize sensitivity to environmental factors, parameter design for minimizing transmitted variation, signal-to-noise ratios, loss functions, accumulation analysis, minute analysis and the analysis of life test data. This paper explains some of Taguchi's contributions to quality engineering and also provides a critical evaluation of his statistical methods.

Research Synthesis Methods  

A basic introduction to fixed-effect and random-effects models for meta-analysis 
Michael Borenstein, Larry V. Hedges, et al.

From the abstract: There are two popular statistical models for meta-analysis, the fixed-effect model and the random-effects model. The fact that these two models employ similar sets of formulas to compute statistics, and sometimes yield similar estimates for the various parameters, may lead people to believe that the models are interchangeable. In fact, though, the models represent fundamentally different assumptions about the data. The selection of the appropriate model is important to ensure that the various statistics are estimated correctly. Additionally, and more fundamentally, the model serves to place the analysis in context.

Scandinavian Journal of Statistics  

Normal inverse Gaussian distributions and stochastic volatility modelling 
Ole E. Barndorff-Nielsen

From the abstract: The normal inverse Gaussian distribution is defined as a variance-mean mixture of a normal distribution with the inverse Gaussian as the mixing distribution. The distribution determines an homogeneous Lévy process, and this process is representable through subordination of Brownian motion by the inverse Gaussian process. The canonical, Lévy type, decomposition of the process is determined. As a preparation for developments in the latter part of the paper the connection of the normal inverse Gaussian distribution to the classes of generalized hyperbolic and inverse Gaussian distributions is briefly reviewed.


Robbing banks: Crime does pay – but not very much
Barry Reilly, Neil Rickman et al.

From the abstract: Robbing a bank is the staple crime of thrillers, movies and newspapers. But, say Barry Reilly, Neil Rickman and Robert Witt, bank robbery is not all it is cracked up to be. With access to a unique data set, they give us the low-down on the economics of the bank heist.

Statistica Neerlandica 

Non- and semi-parametric estimation of interaction in inhomogeneous point patterns 
A. J. Baddeley, J. Møller et al.

From the abstract:  We develop methods for analysing the ‘interaction’ or dependence between points in a spatial point pattern, when the pattern is spatially inhomogeneous. Completely non-parametric study of interactions is possible using an analogue of the K-function. Alternatively one may assume a semi-parametric model in which a (parametrically specified) homogeneous Markov point process is subjected to (non-parametric) inhomogeneous independent thinning. The effectiveness of these approaches is tested on datasets representing the positions of trees in forests.

Statistical Analysis and Data Mining  

A general framework for efficient clustering of large datasets based on activity detection 
Xin Jin, Sangkyum Kim et al.

From the abstract: Data clustering is one of the most popular data mining techniques with broad applications. K-Means is one of the most popular clustering algorithms, due to its high efficiency/effectiveness and wide implementation in many commercial/noncommercial softwares. Performing efficient clustering on large dataset is especially useful; however, conducting K-Means clustering on large data suffers heavy computation burden which originates from the numerous distance calculations between the patterns and the centers. This paper proposes framework General Activity Detection (GAD) for fast clustering on large-scale data based on center activity detection.

Teaching Statistics  

Probability With Less Pain
Maxine Pfannkuch, George A. F. Seber, et al.

From the abstract:  The teaching of probability theory has been steadily declining in introductory statistics courses as students have difficulty with handling the rules of probability. In this article, we give a data-driven approach, based on two-way tables, which helps students to become familiar with using the usual rules but without the formal structure.

WIREs Computational Statistics  

Principal component analysis
Hervé Abdi, Lynne J. Williams

From the abstract:  Principal component analysis (PCA) is a multivariate technique that analyzes a data table in which observations are described by several inter-correlated quantitative dependent variables. Its goal is to extract the important information from the table, to represent it as a set of new orthogonal variables called principal components, and to display the pattern of similarity of the observations and of the variables as points in maps. The quality of the PCA model can be evaluated using cross-validation techniques such as the bootstrap and the jackknife. PCA can be generalized as correspondence analysis (CA) in order to handle qualitative variables and as multiple factor analysis (MFA) in order to handle heterogeneous sets of variables. Mathematically, PCA depends upon the eigen-decomposition of positive semi-definite matrices and upon the singular value decomposition (SVD) of rectangular matrices.

 See what else we are doing to celebrate the International Year of Statistics...