Standard Article

Computational motif discovery

Part 4. Bioinformatics

4.2. Gene Finding and Gene Structure

Basic Techniques and Approaches

  1. Martin Tompa

Published Online: 15 JAN 2005

DOI: 10.1002/047001153X.g402417

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

How to Cite

Tompa, M. 2005. Computational motif discovery. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. 4:4.2:28.

Author Information

  1. University of Washington, Seattle, WA, USA

Publication History

  1. Published Online: 15 JAN 2005

Abstract

The goal of computational motif discovery is to predict short subsequences of biological sequences that are good candidates to serve some biological function. This article focuses on the computational prediction of protein binding sites in nucleotide sequences. Three types of motif model are described: consensus, IUPAC, and weight matrix. Two types of application are described, statistical overrepresentation (in which the input sequences come from a single genome and are believed to contain instances of a single motif) and phylogenetic footprinting (in which the input sequences are homologous, typically one from each of multiple related genomes). Programs of each of these types are briefly described, with references to fuller descriptions and web sites where the programs are available.

Keywords:

  • motif;
  • binding site;
  • consensus;
  • IUPAC;
  • weight matrix;
  • position-specific scoring matrix;
  • statistical overrepresentation;
  • phylogenetic footprinting