3. Biological Sequences, Sequence Alignment, and Statistics

  1. Matthew He1 and
  2. Sergey Petoukhov2

Published Online: 12 OCT 2010

DOI: 10.1002/9780470904640.ch3

Mathematics of Bioinformatics: Theory, Practice, and Applications

Mathematics of Bioinformatics: Theory, Practice, and Applications

How to Cite

He, M. and Petoukhov, S. (2010) Biological Sequences, Sequence Alignment, and Statistics, in Mathematics of Bioinformatics: Theory, Practice, and Applications, John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/9780470904640.ch3

Author Information

  1. 1

    Nova Southeastern University, Fort Lauderdale, Florida, USA

  2. 2

    Russian Academy of Sciences, Moscow, Russia

Publication History

  1. Published Online: 12 OCT 2010
  2. Published Print: 17 DEC 2010

Book Series:

  1. Bioinformatics: Computational Techniques and Engineering

Book Series Editors:

  1. Yi Pan and
  2. Albert Y. Zomaya

ISBN Information

Print ISBN: 9780470404430

Online ISBN: 9780470904640

SEARCH

Keywords:

  • Bayesian approach;
  • binary sequences;
  • biological sequences;
  • multiple sequence alignment;
  • optimal sequence alignment;
  • pairwise sequence alignment;
  • statistics

Summary

This chapter defines biological sequences, mathematical sequences, and binary sequences in theoretical computer science. It describes pairwise sequence alignment, multiple sequence alignment, and optimal sequence alignment. The chapter discusses the scoring system used to rank alignments, the algorithms used to find optimal scoring alignments, and the statistical methods used to evaluate the significance of an alignment score. The foundation of sequence alignment and analysis is based on the fact that biological sequences develop from preexisting sequences instead of being invented by nature from the beginning. A major concern when interpreting alignment results is whether similarity between sequences is biologically significant. Two approaches are presented. One is the classical approach based on the traditional statistical approach of calculating the chance of a match score greater than the value observed. The other is the Bayesian approach, based on a comparison of models.

Controlled Vocabulary Terms

binary sequences; biology