Gene finding using multiple related species: a classification approach
Part 4. Bioinformatics
4.2. Gene Finding and Gene Structure
Short Specialist Review
Published Online: 15 JUL 2005
Copyright © 2005 John Wiley & Sons, Ltd
Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics
How to Cite
Kellis, M. 2005. Gene finding using multiple related species: a classification approach. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. 4:4.2:25.
- Published Online: 15 JUL 2005
Three years after the initial sequencing of the human genome, the actual number of functional human genes remains uncertain. Several expression-based analyses still argue for a hundred thousand transcribed genes, whereas more conservative estimates range between 20 000 and 25 000 genes. The central question in such debates still remains: what constitutes a real gene? In this paper, we address this question and present a comparative genomics approach for systematic gene identification, which observes gene-specific signatures of evolutionary selection across multiple related species. First, we formulate the gene identification problem as a classification problem between genes and noncoding regions, on the basis of their distinct patterns of nucleotide change. We then summarize the results of applying this approach to reannotate the yeast genome, with changes affecting nearly 15% of all genes, and the rejection of more than 500 previously annotated genes. Finally, we discuss the implications of this analysis on understanding the human genome, and strategies for the systematic reannotation of higher eukaryotes.
- comparative genomics;
- evolutionary signatures;
- systematic reannotation;
- defining real genes;
- genome analysis