Standard Article

Information theory as a model of genomic sequences

Part 4. Bioinformatics

4.2. Gene Finding and Gene Structure

Specialist Review

  1. Chengpeng Bi,
  2. Peter K. Rogan

Published Online: 15 APR 2005

DOI: 10.1002/047001153X.g402204

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

How to Cite

Bi, C. and Rogan, P. K. 2005. Information theory as a model of genomic sequences. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. 4:4.2:18.

Author Information

  1. University of Missouri, Kansas City, MO, USA

Publication History

  1. Published Online: 15 APR 2005

Abstract

Shannon information theory can be used to quantify overall sequence conservation among sets of related sequences. Variation in nucleic acid sequences recognized by proteins can be comprehensively modeled with information weight matrices that permit each member sequence to be rank-ordered according to its respective individual information contents. These rankings can be used to compute the affinities of recognition sites by proteins and to predict the effects of nucleotide substitutions in the sequences of these sites. The distribution of information across a set of protein-binding sites in DNA is related to the pattern of intermolecular contacts that stabilize the protein-nucleic acid complex (i.e., the corresponding helical structure of double-stranded DNA).

Keywords:

  • information theory;
  • entropy;
  • thermodynamics;
  • surprisal;
  • weight matrices;
  • binding sites;
  • sequence logo;
  • sequence walker;
  • model refinement;
  • evolution