Chapter 3. Multiple Regression and Model Building

  1. Daniel T. Larose Ph.D. Director

Published Online: 30 JAN 2006

DOI: 10.1002/0471756482.ch3

Data Mining Methods and Models

Data Mining Methods and Models

How to Cite

Larose, D. T. (2005) Multiple Regression and Model Building, in Data Mining Methods and Models, John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/0471756482.ch3

Author Information

  1. Department of Mathematical Sciences, Central Connecticut State University, USA

Publication History

  1. Published Online: 30 JAN 2006
  2. Published Print: 11 NOV 2005

ISBN Information

Print ISBN: 9780471666561

Online ISBN: 9780471756484



  • categorical predictors;
  • indicator variables;
  • multicollinearity;
  • variance inflation factor;
  • model selection methods;
  • forward selection;
  • backward elimination;
  • stepwise regression;
  • best-subsets


Multiple regression, where more than one predictor variable is used to estimate a response variable, is introduced by way of an example. To allow for inference, the multiple regression model is defined, with both model and inferential methods representing extensions of the simple linear regression case. Next, regression with categorical predictors (indicator variables) is explained. The problems of multicollinearity are examined; multicollinearity represents an unstable response surface due to overly correlated predictors. The variance inflation factor is defined, as an aid in identifying multicollinear predictors. Variable selection methods are then provided, including forward selection, backward elimination, stepwise, and best-subsets regression. Mallows'Cp statistic is defined, as an aid in variable selection. Finally, methods for using the principal components as predictors in multiple regression are discussed.