Regularized Sandwich Estimators for Analysis of High-Dimensional Data Using Generalized Estimating Equations


  • David I. Warton

    1. School of Mathematics and Statistics and Evolution and Ecology Research Centre, The University of New South Wales, NSW 2052, Australia
    Search for more papers by this author



Summary A modification of generalized estimating equations (GEEs) methodology is proposed for hypothesis testing of high-dimensional data, with particular interest in multivariate abundance data in ecology, an important application of interest in thousands of environmental science studies. Such data are typically counts characterized by high dimensionality (in the sense that cluster size exceeds number of clusters, n>K) and over-dispersion relative to the Poisson distribution. Usual GEE methods cannot be applied in this setting primarily because sandwich estimators become numerically unstable as n increases. We propose instead using a regularized sandwich estimator that assumes a common correlation matrix R, and shrinks the sample estimate of R toward the working correlation matrix to improve its numerical stability. It is shown via theory and simulation that this substantially improves the power of Wald statistics when cluster size is not small. We apply the proposed approach to study the effects of nutrient addition on nematode communities, and in doing so discuss important issues in implementation, such as using statistics that have good properties when parameter estimates approach the boundary (inline image), and using resampling to enable valid inference that is robust to high dimensionality and to possible model misspecification.