Model Selection for Good Estimation and Prediction over a User-Specified Covariate Distribution for Linear Models under the Frequentist Paradigm


Christine M. Anderson-Cook, Statistical Sciences Group, Los Alamos National Laboratory, Los Alamos, NM 87545, USA.



Model selection is an important part of estimation and prediction for linear models with multiple explanatory variables (covariates). A variety of approaches exist that focus on the estimation of model parameters or the fit of the model where data have been observed. This article proposes an alternative strategy that selects models based on the mean squared error of the estimated expected response for a user-specified distribution of interest on the covariate space. We discuss numerical and graphical tools for detailed comparisons among different models. These tools help select a best model based on its ability to estimate the mean response over covariate locations likely to arise from a distribution of interest and can be combined with cost for deciding whether to include specific covariates. The proposed method is illustrated with three examples. We also present simulation results demonstrating situations where the proposed method shows improvement over some standard alternatives. Copyright © 2011 John Wiley & Sons, Ltd.