Unit

UNIT 9.4 Using Relational Databases for Improved Sequence Similarity Searching and Large-Scale Genomic Analyses

  1. Aaron J. Mackey,
  2. William R. Pearson

Published Online: 1 OCT 2004

DOI: 10.1002/0471250953.bi0904s7

Current Protocols in Bioinformatics

Current Protocols in Bioinformatics

How to Cite

Mackey, A. J. and Pearson, W. R. 2004. Using Relational Databases for Improved Sequence Similarity Searching and Large-Scale Genomic Analyses. Current Protocols in Bioinformatics. 7:9.4:9.4.1–9.4.25.

Author Information

  1. University of Virginia, Charlottesville, Virginia

Publication History

  1. Published Online: 1 OCT 2004
  2. Published Print: SEP 2004

Abstract

Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.

Keywords:

  • relational database;
  • sequence similarity;
  • comparative genomic analysis;
  • homology