Chapter 4. Logistic Regression

  1. Daniel T. Larose Ph.D. Director

Published Online: 30 JAN 2006

DOI: 10.1002/0471756482.ch4

Data Mining Methods and Models

Data Mining Methods and Models

How to Cite

Larose, D. T. (2005) Logistic Regression, in Data Mining Methods and Models, John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/0471756482.ch4

Author Information

  1. Department of Mathematical Sciences, Central Connecticut State University, USA

Publication History

  1. Published Online: 30 JAN 2006
  2. Published Print: 11 NOV 2005

ISBN Information

Print ISBN: 9780471666561

Online ISBN: 9780471756484



  • maximum likelihood estimation;
  • categorical response;
  • classification;
  • the zero-cell problem;
  • multiple logistic regression;
  • WEKA


Logistic regression is introduced by way of a simple example for predicting the presence of disease based on age. The maximum likelihood estimation methods for logistic regression are outlined. Emphasis is placed on interpreting logistic regression output. Inference within the framework of the logistic regression model is discussed, including determining whether the predictors are significant. Methods for interpreting the logistic regression model are examined, including for dichotomous, polychotomous, and continuous predictors. The assumption of linearity is discussed, as well as methods for tackling the zero-cell problem. We then turn to multiple logistic regression, where more than one predictor is used to classify a response. Methods are discussed for introducing higher order terms to handle nonlinearity. As usual, the logistic regression model must be validated. Finally, the application of logistic regression using the freely available software WEKA is demonstrated, using a small example.