Standard Article

FASTA Search Programs

  1. William R Pearson

Published Online: 15 APR 2014

DOI: 10.1002/9780470015902.a0005255.pub2

eLS

eLS

How to Cite

Pearson, W. R. 2014. FASTA Search Programs. eLS. .

Author Information

  1. University of Virginia, Charlottesville, Virginia, USA

Publication History

  1. Published Online: 15 APR 2014

Abstract

The FASTA programs search protein and deoxyribonucleic acid (DNA) databases for sequences with statistically significant similarity. The programs compare proteins, DNA and short peptides and oligonucleotides, and run on most popular computers. FASTA and BLAST both seek to identify homologous proteins or DNA sequences. BLAST is faster, but FASTA is more flexible, providing both rigorous (SSEARCH, LALIGN, GGSEARCH and GLSEARCH) and heuristic (FASTA, FASTX/Y, TFASTX/Y and FASTS/M/F) algorithms, a wider range of scoring matrices and different approaches for estimating statistical significance. In addition, the FASTA programs offer options to search a small, representative database, but then the report results from a larger sequence set linked to the initial significant hits. The FASTA programs can also annotate the alignments to include the conservation state of aligned functional residues, such as active sites, and subalignment scores associated with domain boundaries. The FASTA programs provide flexible and rigorous alternatives to BLAST for protein, translated-DNA and DNA alignment.

Key Concepts:

  • The FASTA program uses a heuristic (approximate) strategy for finding similar sequences, but the FASTA package includes SSEARCH and GGSEARCH, which provide rigorous algorithms.

  • Homologs can be identified because they share excess (statistically significant) sequence similarity.

  • E()-values (expect-values) report the significance (expectation) of a sequence similarity score.

  • Sequence alignments are more accurate when the scoring matrix matches the evolutionary distance of the aligned sequences.

  • The FASTA programs can align against sequences not included in the initial search using library expansion.

  • The FASTA programs can use external annotations to modify aligned sequences and to partition similarity scores.

Keywords:

  • sequence similarity;
  • homology;
  • statistical significance;
  • protein sequence comparison;
  • DNA sequence comparison