Chapter 2. Regression Modeling

  1. Daniel T. Larose Ph.D. Director

Published Online: 30 JAN 2006

DOI: 10.1002/0471756482.ch2

Data Mining Methods and Models

Data Mining Methods and Models

How to Cite

Larose, D. T. (2005) Regression Modeling, in Data Mining Methods and Models, John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/0471756482.ch2

Author Information

  1. Department of Mathematical Sciences, Central Connecticut State University, USA

Publication History

  1. Published Online: 30 JAN 2006
  2. Published Print: 11 NOV 2005

ISBN Information

Print ISBN: 9780471666561

Online ISBN: 9780471756484



  • simple linear regression;
  • least squares;
  • prediction error;
  • outlier;
  • high leverage point;
  • influential observation;
  • confidence interval;
  • prediction interval;
  • transformations


Chapter two begins by using an example to introduce simple linear regression and the concept of least squares. The usefulness of the regression is then measured by the coefficient of determination r2, and the typical prediction error is estimated using the standard error of the estimate s. The correlation coefficient r is discussed, along with the ANOVA table for succinct display of results. Outliers, high leverage points, and influential observations are discussed in detail. Moving from descriptive methods to inference, the regression model is introduced. The t-test for the relationship between x and y is shown, along with the confidence interval for the slope of the regression line, the confidence interval for the mean value of y given x, and the prediction interval for a randomly chosen value of y given x. Methods are shown for verifying the assumptions underlying the regression model. Detailed examples are provided using the Baseball and California data sets. Finally, methods of applying transformations to achieve linearity is provided.