• carbohydrate-binding module;
  • evolutionary tree;
  • glycoside hydrolase family;
  • sequence alignment;
  • starch-binding domain

Approximately 10% of amylolytic enzymes are able to bind and degrade raw starch. Usually a distinct domain, the starch-binding domain (SBD), is responsible for this property. These domains have been classified into families of carbohydrate-binding modules (CBM). At present, there are six SBD families: CBM20, CBM21, CBM25, CBM26, CBM34, and CBM41. This work is concentrated on CBM20 and CBM21. The CBM20 module was believed to be located almost exclusively at the C-terminal end of various amylases. The CBM21 module was known as the N-terminally positioned SBD of Rhizopus glucoamylase. Nowadays many nonamylolytic proteins have been recognized as possessing sequence segments that exhibit similarities with the experimentally observed CBM20 and CBM21. These facts have stimulated interest in carrying out a rigorous bioinformatics analysis of the two CBM families. The present analysis showed that the original idea of the CBM20 module being at the C-terminus and the CBM21 module at the N-terminus of a protein should be modified. Although the CBM20 functionally important tryptophans were found to be substituted in several cases, these aromatics and the regions around them belong to the best conserved parts of the CBM20 module. They were therefore used as templates for revealing the corresponding regions in the CBM21 family. Secondary structure prediction together with fold recognition indicated that the CBM21 module structure should be similar to that of CBM20. The evolutionary tree based on a common alignment of sequences of both modules showed that the CBM21 SBDs from α-amylases and glucoamylases are the closest relatives to the CBM20 counterparts, with the CBM20 modules from the glycoside hydrolase family GH13 amylopullulanases being possible candidates for the intermediate between the two CBM families.