20. Logistic Regression in Genomewide Association Analysis

  1. Mourad Elloumi3 and
  2. Albert Y. Zomaya4
  1. Wentian Li1 and
  2. Yaning Yang2

Published Online: 27 DEC 2013

DOI: 10.1002/9781118617151.ch20

Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data

Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data

How to Cite

Li, W. and Yang, Y. (2013) Logistic Regression in Genomewide Association Analysis, in Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data (eds M. Elloumi and A. Y. Zomaya), John Wiley & Sons, Inc., Hoboken, New Jersey. doi: 10.1002/9781118617151.ch20

Editor Information

  1. 3

    Laboratory of Technologies of Information and Communication and Electrical, Engineering (LaTICE) and University of Tunis-El Manar, Tunisia

  2. 4

    The University of Sydney

Author Information

  1. 1

    Robert S. Boas Center for Genomics and Human Genetics, Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, New York

  2. 2

    Department of Statistics and Finance, University of Science and Technology of China, Hefei, China

Publication History

  1. Published Online: 27 DEC 2013
  2. Published Print: 16 DEC 2013

ISBN Information

Print ISBN: 9781118132739

Online ISBN: 9781118617151

SEARCH

Keywords:

  • Fisher's nonadditivity interaction;
  • genomewide association analysis;
  • latent variables;
  • logistic regression;
  • partial least-squares regression;
  • penalized regression;
  • single genetic marker;
  • single-nucleotide polymorphism (SNP);
  • two genetic markers;
  • variable reduction

Summary

There are several coding schemes to represent a two-allele single-nucleotide polymorphism (SNP) by either the indicator variable or the scaled numerical variable. The optimal coding scheme is related to the underlying disease model. Coefficients in a logistic regression model are related to some population measures of the disease. Fisher's statistical interaction in the logistic regression context is defined as a cross-product term. Besides Fisher's statistical interaction, there are also Bateson's epistasis and biochemical interactions. Variable reduction is a very important part of the application of logistic regression in Genomewide association study (GWAS). Partial leastsquare (PLS) is a method to construct a linear combination of the independent variables so that the covariance between the latent variables and the dependent variable is maximized. Since logic functions are highly nonlinear, they have the potential to model complicated gene–gene interactions more efficiently than linear combinations.