Standard Article

Gene finding using multiple related species: a classification approach

Part 4. Bioinformatics

4.2. Gene Finding and Gene Structure

Short Specialist Review

  1. Manolis Kellis

Published Online: 15 JUL 2005

DOI: 10.1002/047001153X.g402319

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

How to Cite

Kellis, M. 2005. Gene finding using multiple related species: a classification approach. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. 4:4.2:25.

Author Information

  1. MIT Broad Institute for Biomedical Research, Cambridge, MA, USA

Publication History

  1. Published Online: 15 JUL 2005

Abstract

Three years after the initial sequencing of the human genome, the actual number of functional human genes remains uncertain. Several expression-based analyses still argue for a hundred thousand transcribed genes, whereas more conservative estimates range between 20 000 and 25 000 genes. The central question in such debates still remains: what constitutes a real gene? In this paper, we address this question and present a comparative genomics approach for systematic gene identification, which observes gene-specific signatures of evolutionary selection across multiple related species. First, we formulate the gene identification problem as a classification problem between genes and noncoding regions, on the basis of their distinct patterns of nucleotide change. We then summarize the results of applying this approach to reannotate the yeast genome, with changes affecting nearly 15% of all genes, and the rejection of more than 500 previously annotated genes. Finally, we discuss the implications of this analysis on understanding the human genome, and strategies for the systematic reannotation of higher eukaryotes.

Keywords:

  • comparative genomics;
  • evolutionary signatures;
  • systematic reannotation;
  • defining real genes;
  • genome analysis