|I Lecticans||Large extracellular proteoglycans containing mainly chondroitin sulfate side chains. Historically divided into three globular domains (N-terminal G1 and G2, and a C-terminal G3) and a central extended region to which glycosaminoglycan chains are attached. G1 and G2 contain 2–4 Link type CTLDs, while G3 contains a canonical CTLD.|
|Vertebrate genomes encode 4 lecticans: aggrecan, brevican, versican and neurocan||Cell adhesion, tissue integration.||Regulate intracellular processing and trafficking of the protein. G3 region shown to promote glycosaminoglycan chain attachment and aggrecan secretion [109,110]. Galactose/fucose specificity shown for aggrecan C-terminal CTLD  as well as protein ligands [112–115].||X-ray structure of complex of tenascin-R and aggrecan G3 CTLD . ❶❷❸➃|
|II Asialoglycoprotein and DC receptors||Type II transmembrane proteins containing a short cytoplasmic tail, a transmembrane domain, an extracellular stalk region, and Ca2+/carbohydrate binding CTLD. Length of stalk region, involved in oligomerization, varies greatly among different members. Large and heterogeneous group, significantly expanded recently.|
|Asialoglycoprotein receptor(ASGR)subgroup: ASGR, MGL|
|Encoded by a gene cluster: Two genes (ASGR1 and ASGR2) found in many mammals encode subunits of heterotrimeric ASGR (‘hepatic lectin‘). In rodents two ASGR2 glycoforms initially considered as separate proteins (rat hepatic lectin (RHL) 2 and 3 or RHL2/3 ) |
Two genes of macrophage galactose-binding lectin (MGL) in mouse (mMGL1  and mMGL2 ), and one gene in human (hMGL, also called human macrophage lectin, HML ). This subgroup is not present in fish . Our sequence analysis showed the so called ‘chicken hepatic lectin’  is more similar to DC-SIGN subgroup proteins, consistent with its specificity for mannose-type ligands [121,122].
|ASGR [123,124]: heterotrimer, expressed exclusively on liver parenchyma; binds and internalizes galactose-terminated oligosaccharides of desialylated glycoproteins. After ligand dissociation in acidic lysosomes, recycled to cell surface. One of first C-type lectins discovered [10,125]. |
Rat spermatogenic cells express unusual ASGR2 oligomer (sperm galactosyl receptor), consisting of a full-length and a truncated form lacking C-terminal part of CTLD [18,126].
In contrast to ASGRs, other ASGR-gene cluster members are expressed by macrophages. Recombinant mMGL1 and mMGL2 CTLDs show differences in carbohydrate-binding specificities .
|Ca2+/carbohydrate binding. Primary specificity for galactose. Galactose-binding mechanism unusual for CTLDcps, as subunits have different monosaccharide specificity , and bind the same complex carbohydrate molecule simultaneously. Heterooligomeric structure is essential for high-affinity binding and internalization [128,129].||X-ray structure of ASGR1 CTLD . ❶❷➂❹|
|DC-SIGN subgroup: DC-SIGN, CD23, LSECtin|
|Dendritic cell-specific ICAM-grabbing nonintegrin (DC-SIGN, CD209) and its close homologue DC-SIGNR (DC-SIGN receptor) are an actively evolving gene family, with significant differences among mammals. Two genes (DC-SIGN and DC-SIGNR) identified in human, a group of paralogues found in nonhuman primates , and five DC-SIGN homologues found in mouse  (DC-SIGN, SIGNR1, SIGNR2, SIGNR3 and SIGNR4) In the fish genome the DC-SIGN group is also expanded . |
A new protein LSECtin encoded by the DC-SIGN gene cluster recently characterized .
mDC-SIGN identified as hDC-SIGN orthologue based on proximity of its gene to the mCD23 gene. In human genome hCD23 and hDC-SIGN are closely linked.
|DC-SIGN is responsible for HIV particle transfer and in-trans infection of T-cells . Also a receptor for other pathogens, such as Mycobacterium tuberculosis , hepatitis C virus , Ebola virus , and human cytomegalovirus . |
CD23 [82,138,139] (low affinity IgE receptor) is a glycoprotein expressed on several cell types including lymphocytes, eosinophils, platelets, and macrophages, and also found in a soluble form produced by proteolysis. A key molecule of B-cell activation and growth. Oligomerization via coiled-coil stalk region significantly increases its affinity for IgE .
LSECtin found in sinusoidal endothelial cells of human liver and lymph node ; this is similar to the expression profile of DC-SIGNR.
|Carbohydrate recognition plays a central role in the DC-SIGN binding of pathogens. |
CTLD of CD23 is involved in both protein-protein and protein–carbohydrate interactions. Although human CD23 binds IgE in a carbohydrate-independent manner , recognition of another ligand (CD21) and CD23-induced cell aggregation require Gal-terminated glycan chains [142–144]. Predicted Ca2+-binding site 2 motifs of human CD23 CTLD are EPT and WND (EPN in mouse, rat and horse), which are typical for mannose-binding CTLDs, so galactose specificity of CD23 is unexpected.
The CTLD of LSECtin contains an EPN motif and, as expected, preferentially binds mannose-type ligands.
|X-ray structures of DC-SIGN and DC-SIGNR complexed with oligosaccharides [53,96]. ❶❷❸➃|
NMR structure of human CD23 defining its interactions with IgE and CD21 . ❶➁➂➃
|Macrophage receptors: MCL, Mincle, DLEC, DCIR, DCAR, Dectin-2|
|Gene cluster [146,147] (human 12p13; mouse 6F2), closely linked to NK cell receptor complex (group V). Encodes several CTLDcps expressed by macrophages and dendritic cells: macrophage C-type lectin (MCL ), macrophage-inducible C-type lectin (Mincle ), dendritic cell immunoreceptor (DCIR ), dendritic cell lectin (DLEC or BDCA-2 [151,152]) and dendritic cell-associated lectin-2 (Dectin-2 [153,154]). Rodent-only: DCIR paralogues (DCIR2-DCIR4 ), and a dendritic cell immunoactivating receptor (DCAR ).||Subgroup members discovered only relatively recently, and their functions poorly characterized .||Only available information on carbohydrate-binding properties from two studies on Dectin-2, which gave conflicting results: in one case Ca2+-dependent binding to mannose observed , while in other the protein did not bind carbohydrate . In all group members, a putative Ca2+-binding site 2 motif is present, although in some cases has unusual sequence (e.g. EPK, ESN, EPD in rat, mouse and human MCL, respectively).|| |
|Langerin and Kupffer cell receptors|
|Cluster of two genes (human 2p13, mouse Ch 6D1), closely linked to NK cell receptor complex (group V): Langerin (CD207) and Kupffer cell receptor (KuCR). |
KuCR locus in human genome lacks 3′-terminal exon, which truncates CTLD at beginning of long loop region; hence, suggestion that human receptor is a pseudogene . However, a full-length cDNA (AK096429) for hKuCR is now available in GenBank (63% identity with rat).
|Langerin is an endocytic receptor uniquely expressed by Langerhans cells and associated with Birbeck granules in human  and mouse . Long stalk region involved in trimerization (coiled-coil). |
Kupffer cell receptor structure is similar to Langerin, but expressed in liver and functions as endocytic receptor for fucose-terminated glycoproteins [160,161].
|CTLD of Langerin has typical motifs associated with mannose binding, and protein indeed shown to bind mannose-group monosaccharides . |
Rat KuCR contains a galactose-type QPD motif, but interestingly it binds fucose with relatively high affinity .
|Scavenger receptor with a CTLD (SRCL) |
|SRCL (human Ch 18p11 [163,164]) has unusual structure for group II proteins. Contains a collagen domain and coiled-coil region, and thus was described as a placental collectin (CL-P1 ; HUGO name COLEC12). However, except for collagen region, the domain structure is analogous to other group II CTLDcps; our phylogenetic analysis of CTLD alignments confidently places SRCL into group II.||Endocytic receptor; binds Gram-negative and Gram-positive bacteria, yeasts, oxidized low density lipoprotein .||CTLD of SRCL is similar to ASGR CTLDs, including all elements shown to contribute to high-affinity galactose binding by ASGR (QPD motif, a tryptophan and glycine-rich loop ). SRCL indeed binds galactose-type ligands  and has unusual high selectivity for glycans containing Lewisx epitope .|| |
|III Collectins||Soluble CTLDcps that contain a collagen domain and function as the first line of the innate immune defense .|
|Serum mannose-binding protein(s) (MBP) and pulmonary surfactant proteins (PSP). Other members: human liver collectin CL-L1  (unusual as only found in cytoplasm). Bovidae collectins: conglutinin, CL-43 and CL-46; genes physically linked with MBP and PSP . Highly conserved vertebrate collectin identified in the Fugu whole-genome study .||Innate immunity: recognition of pathogen carbohydrates, complement activation via lectin pathway, activation of phagocytosis [e.g 171–173].||Unique binding specificity and spatial organization of CTLDs in oligomeric complexes allows collectins to recognize ordered arrays of carbohydrates specific to the surfaces of microorganisms (pathogen associated molecular patterns (PAMPs)) [174,175]. MBP and PSP also bind nucleic acids via both the CTLD and collagen region , and PSP binds phospholipids via the CTLD . |
As discussed in text, MBP wild-type and mutant structures used in classical studies of Drickamer, Weis and coworkers on the mechanism of carbohydrate recognition by CTLDs.
|Numerous X-ray structures for wild-type and mutant MBP and complexes [e.g 14,86,89,178]. ❶❷[❸/➂]➃|
X-ray structures for rat PSP-A  and human PSP-D, including sugar complex [55, 61]. ➀❷➂➃,❶❷➂➃, ❶❷❸➃
|IV Selectins||Type I transmembrane proteins containing CTLD, EGF and 2–9 complement control protein (CCP) domains.|
|Three selectin L- (leukocyte), P- (platelet) and E- (endothelial); encoded by a compact gene cluster; this organization is conserved among vertebrates .||Cell adhesion [179,180]. Involved in the first step (initial attachment (tethering) and subsequent movement (rolling)) of leukocyte recruitment from the blood stream into sites of inflammation and lymphatic tissues.||CTLDs bind the carbohydrate sialyl LewisX (SLeX) with low affinity; different high-affinity glycoprotein ligands also identified [reviewed in 180–182]. Binding occurs via an extended site on the CTLD; in addition to fucose binding at the primary site, electrostatic and hydrogen–bond interactions are formed with other monosaccharide moieties of SLeX .||X-ray structures of E- and P-selectin complexes with SLeX and other ligands . ➀❷➂➃|
|V ‘NK – cell receptors’||Non-Ca2+-binding type II transmembrane CTLDcps. Despite the common group definition of ‘type II NK cell receptors’, many are not (exclusively) expressed by NK cells: CD72 is expressed on B-cells ; CD69 on various hematopoietic cells ; KLRG1/MAFA on basophils and NK cells ; LOX-1 on vascular endothelial cells [186,187]; DCAL1, CLEC-1, KLRL1 on dendritic cells [188,189]; Dectin-1 on macrophages and dendritic cells [190,191]; MDL-1 exclusively on monocytes and macrophages ; and CLEC-2 in liver .|
|This group is evolutionarily young and unambiguously identified only in higher vertebrates. Mostly encoded by a single large ‘NK gene cluster’  (human Ch 12p13, mouse Ch 6F3), but uniquely CD72 is on Ch 9 in human and Ch 4 in mouse , and MDL-1 on Ch 7p33 in human . |
Substantial variability between rodents and human: large mouse gene family (Ly49-A – Ly49-x, official symbols KLRA1-KLRA28; some are alleles), even larger in rat (at least 36 genes ) but only a single gene (Ly49L/KLRA1) in human, encoding a truncated protein, which lacks the distal part of CTLD .
|Majority belong to killer cell lectin-like receptor [KLR; unofficial names NKG2 (NK cell group 2) and Ly49] group. Variously associated with inhibition or activation of NK cells, although exact function of many is unknown. KLRs form homo- (e.g. KLRK1/NKG2D) or heterodimers (e.g. CD94 and KLRC1/NKG2A).||Most have protein ligands. |
Some are multivalent and bind both carbohydrate and protein ligands [81,107]; most striking example is Dectin-1, characterized as a macrophage β-glucan receptor . Polysaccharide binding to Dectin-1 is cation-independent, and mutagenesis studies suggest binding site is not at typical CTLD carbohydrate-binding site .
|X-ray structures of NKG-2D (human and mouse) with their MHC-like ligands [197,198], Ly49C , Ly49A , Ly49I , CD69 , CD94 , and LOX-1 . ➀➁➂➃|
|VI Multi-CTLD endocytic receptors||Type I transmembrane proteins with an N-terminal ricin-like domain, a fibronectin type 2 domain and 8 or 10 (Dec205) CTLDs in the extracellular domain, and a short cytoplasmic domain.|
|4 members (from fish to human): Endo180, phospholipase A2 receptor (PLA2R), macrophage mannose receptor (MManR), and Dec205 (reviewed recently in ).||Recycling endocytic receptors.||Monosaccharide binding demonstrated only for Endo180  and MManR ; in both cases, activity is limited to a single domain (4 and 2, respectively), other domains being required for high-affinity binding of multivalent ligands [101,206]. Most CTLDs of group VI proteins do not contain residue motifs associated with Ca2+ -binding site 2.||X-ray structure of MManR CTLD 4 . ➀❷➂➃|
|VII Reg group||CTLD preceded by a short N-terminal peptide. In the initial classification this group included all other soluble single-CTLD proteins.|
|Four subgroups [207, 208] with a gene cluster on human 2p12 (mouse 6C3) and a single gene (Reg4) on human 1p12 (mouse 3F3). The group outlier Reg4 has much less sequence similarity to the other group members. As discussed in text section on snake CTLDcps, Reg4 proteins represent the ancestral member of the mono-CTLD group.||First member of family, now known as Reg1, identified simultaneously by several groups in different functional contexts. Reflected in alternative names : pancreatic stone protein (PSP), as was isolated from pancreatic stones; lithostathine, as was considered an inhibitor of calcite crystal growth [77,210] (not confirmed by other studies [211,212]); and regenerating gene (Reg), as overexpression was observed in regenerating pancreatic islets . |
No Reg family member contains a characteristic Ca2+ -binding motif. Although involvement shown in various physiological and pathological processes, molecular mechanisms of action largely unknown . Lithostathine also studied due to its ability to form amyloid fibrils and possible involvement in early stages of Alzheimer's disease development [215–217].
|X-ray structures of polymerized lithostathine protofibrils , and monomer [218,219]. ➀➁➂➃|
|VIII Chondrolectin, Layilin||Type I transmembrane proteins with a single CTLD.|
|Two members: Layilin and Chondrolectin (CHODL/MT75).||Layilin expressed in wide range of cell lines and tissues  and may function as either an endocytic receptor or adhesion molecule .||CTLDs of Layilin and CHODL contain a motif associated with Ca2+-binding, although the motif is unusual (EPS). Layilin binds hyaluronan via the CTLD and intracellular proteins from the ERM family (talin and radixin). No hyaluronan binding could be detected for CHODL .|| |
|IX Tetranectin||Soluble proteins with a long N-terminal α-helical domain involved in coiled-coil formation. This structure resembles the structure of the C-terminal domain of collectins. SCGF also contains an N-terminal mucin-like Ser/Thr rich region.|
|Three identified members: tetranectin, stem cell growth factor (SCGF, LSCLCL [223–226]) and CLECSF1. Similarity between group IX and group III is further supported by gene structure (intronless CTLD) and molecular phylogeny reconstruction based on CTLD alignment. SCGF identified in two forms (α and β), difference (78 residues ) cannot be explained by alternative splicing, as located within exon 3 encoding the CTLD .||Tetranectin is involved in tissue remodeling, activates plasminogen (main ligand), and is expressed in developing tissues . |
Expression data only available for CLECSF1 . Shark homologue reported (called shark tetranectin [228,229]).
SCGF detected in culture medium of a human myeloid cell line ; mitogen [223,231].
|All contain a motif that would satisfy requirements for Ca2+ but not carbohydrate binding in CTLD. Tetranectin binds Ca2+ and carbohydrate (heparin) independently . Ca2+ competes with plasminogen for binding to tetranectin CTLD . |
As the truncated (SCGFβ) form was reported to be an active growth factor, it is unlikely that this activity is mediated by CTLD.
|X-ray structure of tetranectin in Ca2+-bound  and Ca2+-free forms . ❶❷➂➃|
|X Polycystin 1||Large multidomain protein with 11 membrane-spanning regions, thought to be involved in cell-cell or cell–matrix interactions. The extracellular domain of PKD1 is ∼3000 amino acids long and contains 16 PKD domains, which have an Ig-like fold , a leucine-rich repeat domain, a putative carbohydrate-binding WSC domain, a CTLD and a domain homologous to the sea urchin receptor egg jelly protein 1 (suREJ1 ).|
|PKD1 (polycystin-1) has several homologues in vertebrates, but not all of them contain a CTLD. Its sea urchin homologue (suREJ1) lacks most of PKD1's domains and contains only one TM region, but two CTLDs . suREJ1 paralogues (suREJ2 and suREJ3) isolated later contain one CTLD [235,236]. Thus, the PKD1 group may be the most ancient group of vertebrate CTLDcps as it can be traced back to the early evolution of deuterostomes. Of several PKD1 paralogues identified in mouse and human, only two (PKD1L2 and PKD1L3) contain CTLDs .||PKD1 initially identified as one of two genes in which mutations are responsible for onset of autosomal dominant polycystic kidney disease (ADPKD) [238,239]. The function of the CTLD in polycystin 1, as well as the function of the protein itself, remains unknown.||Study of a GST-fused recombinant PKD1 CTLD showed binding to unsubstituted carbohydrate matrices (Sepharose and Sephadex G25), as well as to several extracellular matrix proteins, with high affinity and in a Ca2+-dependent manner . This is intriguing, taking into account the sequences of the Ca2+-binding site 2 motifs (EPH and WCNT). |
Unfortunately, the results cannot be interpreted unambiguously as the GST domain was not cleaved from the CTLD.
|XI Attractin||Glycoprotein expressed in transmembrane or soluble form due to alternative splicing ; contains a CUB domain (found in many developmentally regulated proteins), four EGF-like domains, and four PSI domains (found in plexins, semaphorins and integrins).|
|Orthologue of attractin found in C. elegans, but lacks the CTLD . Interestingly, a CTLD occurs very frequently in combination with a CUB domain in C. elegans. A well-conserved vertebrate attractin paralogue can be found in sequence databases (hypothetical protein KIAA0534); no description of this protein has been published.||Expressed by hematopoietic cells . In mouse associated with the mahogany mutation, which affects the melanocortin signaling pathway ; and in rat, the zitter mutation in tremor rats .||Unknown.|| |
|XII Eosinophil major basic protein (EMBP)||Soluble protein containing a highly basic CTLD and an acidic pro-peptide, which is cleaved off in the active form.|
|Paralogue of EMBP, EMBP-2, identified in mouse  and human [246,247].||Estimated pI of 11; major component of crystalloid core of eosinophil-specific granules; functions as a cytotoxic agent against parasites. Despite its highly basic nature, the CTLD of EMBP has a typical CTLD fold . Ligand-binding functions unclear but finding of binding of heparin may be involved .||X-ray structure of EMBP . ➀➁➂➃|
|XIII DGCR2||Type I transmembrane protein containing vWF, CTLD and LDL domains in extracellular region.|
|One member, DGCR2/IDD/Sez12, which was localized in the DiGeorge syndrome (OMIM 188400) critical region [248–250]||Function of protein unknown.||CTLD of DGCR2 does not contain characteristic Ca2+-binding motifs.|| |
|XIV Thrombomodulin||Type I transmembrane proteins with a short intracellular domain and an extracellular part that includes a CTLD, a domain referred to as hydrophobic or sushi-like, one or more EGF domains, and low complexity Ser/Thr-rich regions, which are targets for O-glycosylation|
|Four members: thrombomodulin (TM), Endosialin/TEM1, CD93/C1qRP/AA4 and a novel member we named CETM .||All expressed on vascular endothelium. Endosialin only found on tumor vascular endothelium [251–253, but, see 254], while C1qRP and TM expressed more broadly. Based on EST data, CETM is ubiquitously expressed. TM is a very well characterized CTLDcp, due to its importance in the coagulation pathway. Thrombin binding to EGF domains 5 and 6 of TM promotes protein C activation (up to 1000×), which makes TM a potent tissue anticoagulant . TM and CD93 are involved in cell adhesion and inflammation control [106,256,257].||Thrombomodulin fragment referred to as ‘lectin domain’, which includes CTLD and the following 67-residue ‘hydrophobic region’, is required for Ca2+-dependent thrombomodulin-mediated cell adhesion; this is inhibited by mannose and chondroitin sulfate A or C . However, the TM CTLD does not contain a typical carbohydrate-binding motif. |
The CETM CTLD sequence contains a putative carbohydrate-binding motif (EPN), normally associated with mannose specificity .
|XV Bimlec||Type I transmembrane protein with neck region and CTLD in extracellular region.|
|New group created as predicted in our Fugu whole-genome analysis , and supported by a database cDNA sequenced (named ‘Bimlec’), linked to DEC-205.||Function unknown. Expressed as fusion protein with DEC-205 in Hodgkin's lymphoma cells .|| || |
|XVI SEEC||Souble protein containing SCP, EGF, EGF and CTLD domains (SEEC ). The sperm-coating glycoprotein (SCP) domain, which is present in organisms from yeast and plants to mammals, but whose function is unknown , is rarely observed in combination with other domains in proteins. SCP/CTLD combination is observed in only one other known protein – Nowa from hydra .|
|New group created as predicted in our Fugu whole-genome analysis ; well conserved between human and fish. Supported by available cDNAs||Function unknown.||CTLD has potential Ca-carbohydrate-binding motif (QPD) characteristic of galactose specificity.|| |
|XVII CBCP/Frem1/QBRICK||Large proteoglycan (∼2100 residues) containing a set of chondroitin sulphate proteoglycan (CSPG) repeats (homologous to the NG2 ectodomain ), a calcium-binding Calx-β domain and a CTLD. CBCP (Calx-β and CTLD containing Protein )|
|New group created as predicted in our Fugu whole-genome analysis . Novel member of protein family not reported previously to have members containing CTLDs; examples of this family include human MCSP/CSPG4  and mouse Fras1  genes.||Gene independently discovered twice experimentally in mouse and called Frem1  and QBRICK . Found expressed widely in developing embryo in regions of epithelial/mesenchymal interaction and epidermal remodeling, and appeared to act as mediator of basement membrane adhesion . Also found as adhesive ligand of basement membrane recognized by cells in embryonic skin and hair follicles through integrins .||CTLD of CBCP lacks Ca-binding residues, and its long loop region is short, resembling that of group V CTLDs.|| |