Standard Article

Sequence complexity of proteins and its significance in annotation

Part 4. Bioinformatics

4.3. Protein Function and Annotation

Short Specialist Review

  1. Birgit Eisenhaber,
  2. Frank Eisenhaber

Published Online: 15 JAN 2005

DOI: 10.1002/047001153X.g403313

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics

How to Cite

Eisenhaber, B. and Eisenhaber, F. 2005. Sequence complexity of proteins and its significance in annotation. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics. 4:4.3:32.

Author Information

  1. Institute of Molecular Pathology, Vienna, Austria

Publication History

  1. Published Online: 15 JAN 2005

Abstract

If protein sequences are viewed as a string of letters representing amino acid types, the complexity of the text message can be evaluated by analyzing residue composition and repetitive occurrence of short motifs and also by applying entropy- or information-like measures. Low complexity segments represent more than a quarter of the total length of proteins in databases and their relative amount tends to increase with the sequencing of eukaryote genomes. In contrast, only about 0.5% of the residues in structures of known globular domains are located in usually short stretches with strong compositional bias. Low complexity segments with bias toward hydrophobic residues or with a repetitive hydrophobic pattern are typically involved in fibrillar or membrane-embedded protein structures. In contrast, the structure forming potential and the molecular function of low complexity regions with primarily polar residues remains insufficiently understood, although, for some of them, their mutational variation has already been implicated in pathological processes.

Keywords:

  • amino acid compositional bias;
  • sequence complexity;
  • low complexity filter;
  • SEG;
  • nonglobular region;
  • glutamine repeats;
  • Huntington disease