Unit
UNIT 9.4 Using Relational Databases for Improved Sequence Similarity Searching and Large-Scale Genomic Analyses
Published Online: 1 OCT 2004
DOI: 10.1002/0471250953.bi0904s7
Copyright © 2004 by John Wiley & Sons, Inc.
Lab Protocol Title

Current Protocols in Bioinformatics
Additional Information
How to Cite
Mackey, A. J. and Pearson, W. R. 2004. Using Relational Databases for Improved Sequence Similarity Searching and Large-Scale Genomic Analyses. Current Protocols in Bioinformatics. 7:9.4:9.4.1–9.4.25.
Publication History
- Published Online: 1 OCT 2004
- Published Print: SEP 2004
- Abstract
- Article
- Figures
- Tables
- References
Abstract
Relational databases are designed to integrate diverse types of information and manage large sets of search results, greatly simplifying genome-scale analyses. Relational databases are essential for management and analysis of large-scale sequence analyses, and can also be used to improve the statistical significance of similarity searches by focusing on subsets of sequence libraries most likely to contain homologs. This unit describes using relational databases to improve the efficiency of sequence similarity searching and to demonstrate various large-scale genomic analyses of homology-related data. This unit describes the installation and use of a simple protein sequence database, seqdb_demo, which is used as a basis for the other protocols. These include basic use of the database to generate a novel sequence library subset, how to extend and use seqdb_demo for the storage of sequence similarity search results and making use of various kinds of stored search results to address aspects of comparative genomic analysis.
Keywords:
- relational database;
- sequence similarity;
- comparative genomic analysis;
- homology
