# Chapter 2. Regression Modeling

Published Online: 30 JAN 2006

DOI: 10.1002/0471756482.ch2

Copyright © 2006 John Wiley & Sons, Inc.

Book Title

## Data Mining Methods and Models

Additional Information

#### How to Cite

Larose, D. T. (2005) Regression Modeling, in Data Mining Methods and Models, John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/0471756482.ch2

#### Publication History

- Published Online: 30 JAN 2006
- Published Print: 11 NOV 2005

#### ISBN Information

Print ISBN: 9780471666561

Online ISBN: 9780471756484

- Summary
- Chapter

### Keywords:

- simple linear regression;
- least squares;
- prediction error;
- outlier;
- high leverage point;
- influential observation;
- confidence interval;
- prediction interval;
- transformations

### Summary

Chapter two begins by using an example to introduce simple linear regression and the concept of least squares. The usefulness of the regression is then measured by the coefficient of determination *r*^{2}, and the typical prediction error is estimated using the standard error of the estimate *s*. The correlation coefficient *r* is discussed, along with the ANOVA table for succinct display of results. Outliers, high leverage points, and influential observations are discussed in detail. Moving from descriptive methods to inference, the regression model is introduced. The *t*-*test* for the relationship between *x and y* is shown, along with the confidence interval for the slope of the regression line, the confidence interval for the mean value of *y* given *x*, and the prediction interval for a randomly chosen value of *y* given *x*. Methods are shown for verifying the assumptions underlying the regression model. Detailed examples are provided using the *Baseball* and *California* data sets. Finally, methods of applying transformations to achieve linearity is provided.