Standard Article

Repetitive Elements: Bioinformatic Identification, Classification and Analysis

  1. Jerzy Jurka,
  2. Weidong Bao,
  3. Kenji Kojima,
  4. Vladimir V Kapitonov

Published Online: 15 FEB 2011

DOI: 10.1002/9780470015902.a0005270.pub2

eLS

eLS

How to Cite

Jurka, J., Bao, W., Kojima, K. and Kapitonov, V. V. 2011. Repetitive Elements: Bioinformatic Identification, Classification and Analysis. eLS. .

Author Information

  1. Genetic Information Research Institute, Mountain View, California, USA

Publication History

  1. Published Online: 15 FEB 2011

Abstract

Multicopy, or repetitive, deoxyribonucleic acid (DNA) is routinely being detected and analysed by computer-assisted comparison of genomic DNA with reference databases of repeats. The most representative collection of repetitive elements is ‘Repbase Update’ (RU), which currently contains >15 000 unique entries from diverse eukaryotic species. The majority of transposable elements (TEs) in RU are consensus sequences based on multiple alignments of individual repeats. Consensus sequences are approximations of active TEs responsible for generating multiple mutated copies in the genome. The current two major repeat detection and annotation programs, RepeatMasker and CENSOR, both use RU for annotation of repeats in eukaryotic genomes. RU is also increasingly being used as a master reference library to create custom libraries for detection of repeats in newly sequenced genomes. Finally, a combination of different routines can be used to detect repeats not similar to those already present in the reference libraries (de novo approach).

Key Concepts:

  • Active transposable elements (TEs) produce families and subfamilies of multiple copies in the genome, called ‘interspersed repetitive elements’ or ‘repeats’.

  • Consensus sequences derived from aligned families and subfamilies of repeats are excellent approximations of the active TEs from which they were derived.

  • Consensus sequences are also preferred reference sequences used in screening and annotation of repetitive elements, especially the most divergent ones.

  • RepeatMasker and CENSOR are basic repeat screening and annotation programs using reference sequence libraries.

  • In the absence of reference sequences, repetitive DNA can be detected by screening for multiple copies and characteristic structural features (de novo approach).

Keywords:

  • transposable elements (TEs);
  • simple sequence repeats (SSRs);
  • repeat maps;
  • computational biology;
  • reference databases