Standard Article

Genome signals and assembly

Part 4. Bioinformatics

4.1. Genome Assembly and Sequencing

Specialist Review

  1. Marek Kimmel

Published Online: 15 JAN 2005

DOI: 10.1002/047001153X.g401210

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

How to Cite

Kimmel, M. 2005. Genome signals and assembly. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. 4:4.1:3.

Author Information

  1. Rice University, Houston, TX, USA

Publication History

  1. Published Online: 15 JAN 2005


Quality of genome assembly depends to a large extent on the structure of genomic sequences, notably signals such as repeats, polymorphisms, and nucleotide asymmetry as well as structural motifs (see Statistical signals) such as protein motifs (see Computational motif discovery), gene promoters, enhancers and suppressors (see Promoter prediction and Exon splicing enhancers), transcription factor binding sites, exon/intron splice junctions, regions of homology between sequences (see IMPALA/RPS-BLAST/PSI-BLAST in protein sequence analysis), and protein docking sites. We address probabilistic and statistical issues, departing from modeling the genome assembly as a binomial/Poisson process and considerations of contig size. We then consider estimation of genome size, when repeats are involved, and estimation of the total gap length and of the stringency ratio. Then we consider bubble smoothing in the context of polymorphisms and empirical estimates of the impact of low GC contents on stringency. Finally, we refer to some special issues and examples.


  • annotation by words;
  • binomial/Poisson process;
  • EM algorithm;
  • estimating genome length;
  • expected contig size;
  • gap statistics;
  • nucleotide asymmetry;
  • polymorphism;
  • repetitive elements