Information theory as a model of genomic sequences
Part 4. Bioinformatics
4.2. Gene Finding and Gene Structure
Published Online: 15 APR 2005
Copyright © 2005 John Wiley & Sons, Ltd
Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics
How to Cite
Bi, C. and Rogan, P. K. 2005. Information theory as a model of genomic sequences. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. 4:4.2:18.
- Published Online: 15 APR 2005
Shannon information theory can be used to quantify overall sequence conservation among sets of related sequences. Variation in nucleic acid sequences recognized by proteins can be comprehensively modeled with information weight matrices that permit each member sequence to be rank-ordered according to its respective individual information contents. These rankings can be used to compute the affinities of recognition sites by proteins and to predict the effects of nucleotide substitutions in the sequences of these sites. The distribution of information across a set of protein-binding sites in DNA is related to the pattern of intermolecular contacts that stabilize the protein-nucleic acid complex (i.e., the corresponding helical structure of double-stranded DNA).
- information theory;
- weight matrices;
- binding sites;
- sequence logo;
- sequence walker;
- model refinement;