Fast forward selection for generalized estimating equations with a large number of predictor variables



We propose a new variable selection criterion designed for use with forward selection algorithms; the score information criterion (SIC). The proposed criterion is based on score statistics which incorporate correlated response data. The main advantage of the SIC is that it is much faster to compute than existing model selection criteria when the number of predictor variables added to a model is large, this is because SIC can be computed for all candidate models without actually fitting them. A second advantage is that it incorporates the correlation between variables into its quasi-likelihood, leading to more desirable properties than competing selection criteria. Consistency and prediction properties are shown for the SIC. We conduct simulation studies to evaluate the selection and prediction performances, and compare these, as well as computational times, with some well-known variable selection criteria. We apply the SIC on a real data set collected on arthropods by considering variable selection on a large number of interactions terms consisting of species traits and environmental covariates.