# 3. Biological Sequences, Sequence Alignment, and Statistics

1. Matthew He1 and
2. Sergey Petoukhov2

Published Online: 12 OCT 2010

DOI: 10.1002/9780470904640.ch3

## Mathematics of Bioinformatics: Theory, Practice, and Applications

#### How to Cite

He, M. and Petoukhov, S. (2010) Biological Sequences, Sequence Alignment, and Statistics, in Mathematics of Bioinformatics: Theory, Practice, and Applications, John Wiley & Sons, Inc., Hoboken, NJ, USA. doi: 10.1002/9780470904640.ch3

#### Author Information

1. 1

Nova Southeastern University, Fort Lauderdale, Florida, USA

2. 2

Russian Academy of Sciences, Moscow, Russia

#### Publication History

1. Published Online: 12 OCT 2010
2. Published Print: 17 DEC 2010

#### Book Series:

1. Bioinformatics: Computational Techniques and Engineering

#### Book Series Editors:

1. Yi Pan and
2. Albert Y. Zomaya

#### ISBN Information

Print ISBN: 9780470404430

Online ISBN: 9780470904640

## SEARCH

### Keywords:

• Bayesian approach;
• binary sequences;
• biological sequences;
• multiple sequence alignment;
• optimal sequence alignment;
• pairwise sequence alignment;
• statistics

### Summary

This chapter defines biological sequences, mathematical sequences, and binary sequences in theoretical computer science. It describes pairwise sequence alignment, multiple sequence alignment, and optimal sequence alignment. The chapter discusses the scoring system used to rank alignments, the algorithms used to find optimal scoring alignments, and the statistical methods used to evaluate the significance of an alignment score. The foundation of sequence alignment and analysis is based on the fact that biological sequences develop from preexisting sequences instead of being invented by nature from the beginning. A major concern when interpreting alignment results is whether similarity between sequences is biologically significant. Two approaches are presented. One is the classical approach based on the traditional statistical approach of calculating the chance of a match score greater than the value observed. The other is the Bayesian approach, based on a comparison of models.

#### Controlled Vocabulary Terms

binary sequences; biology