The carbohydrate recognition domain of collectins


  • Edwin J. A. Veldhuizen,

    1. Department of Infectious Diseases and Immunology, Division of Molecular Host Defence, Faculty of Veterinary Medicine, Utrecht University, The Netherlands
    Search for more papers by this author
  • Martin van Eijk,

    1. Department of Infectious Diseases and Immunology, Division of Molecular Host Defence, Faculty of Veterinary Medicine, Utrecht University, The Netherlands
    Search for more papers by this author
  • Henk P. Haagsman

    1. Department of Infectious Diseases and Immunology, Division of Molecular Host Defence, Faculty of Veterinary Medicine, Utrecht University, The Netherlands
    Search for more papers by this author

H. P. Haagsman, Department of Infectious Diseases and Immunology, Division of Molecular Host Defence, Faculty of Veterinary Medicine, Utrecht University, PO Box 80.175, 3508TD Utrecht, The Netherlands
Fax: +31 30 2532365
Tel: +31 30 2535354


Collectins are effector molecules of the innate immune system that play an important role in the first line of defence against bacteria, viruses and fungi. Most of their interactions with microorganisms are mediated through their carbohydrate recognition domain (CRD), which binds in a Ca2+-dependent manner to glycoconjugates. This domain is a well-known structure that is present in a larger group of proteins comprising the C-type lectin domain family. Collectins form a subgroup within this family based on the presence of a collagen domain and the trimerization of CRDs, which are essential for the ligand-binding properties of these proteins. The ligand specificity among the nine collectin members is significantly different as a result of both the structural organization of the trimers and specific sequence changes in the binding pocket of the CRD. In addition, some collectin members have additional features, such as N-linked glycosylation of CRD residues and additional loop structures within the CRD that have a large impact on their interaction with the glycoconjugates present on microorganisms or host cells. The availability of crystal structures of three members of the collectin family (surfactant proteins A and D and mannan-binding protein) provides an important tool for addressing the impact of these CRD differences on ligand binding. In this review, the structural differences and similarities between the CRDs of collectins are summarized and their relationship with their ligand-binding characteristics is discussed.




collectin kidney 1


collectin liver 1


collectin placenta 1


carbohydrate recognition domain


C-type lectin domain




mannan-binding lectin


mannan-binding protein




porcine SP-D


rat MBP


surfactant protein A


surfactant protein


Collectins are C-type (Ca2+-dependent) lectins that are important effector proteins of the innate immune system. Their carbohydrate recognition domain (CRD) is a carbohydrate-binding C-type lectin domain (CTLD) that places collectins in a subgroup of the large family of C-type lectins [1]. Their functions (depending on the specific collectin) include aggregation and neutralization of microorganisms, induction of opsonization and activation of phagocytosis, complement activation, modulation of inflammatory responses, and modulation of the adaptive immune system. The importance of collectins in vivo has been elegantly shown in genetically-modified mice lacking either the collectins mannan-binding protein (MBP) or surfactant protein A or D (SP-A, SP-D) [2–4]. These mice were more susceptible to both viral and bacterial infections.

In addition to the CRD, collectins are characterized by three other functional domains (Fig. 1A). All collectins contain a small N-terminal domain followed by a collagen domain of variable length. This collagen domain is linked to the CRD through a small neck region. All four domains contribute to the diverse functionality of collectins. Most notably, the neck region induces trimerization of the protein, further stabilized by the collagen domain, resulting in three CRDs being in close contact with each other (Fig. 1B). This enables these proteins to recognize glycan patterns on microorganisms or host cells, instead of separate isolated glycan structures. The N-terminal domain of collectins enables higher oligomerization to either octadecamers (six trimers) or dodecamers (four trimers). The oligomerization of the collectin trimers gives rise to either a cruciform structure in which the trimers are spatially separated, as seen in SP-D and conglutinin, or results in a conformation resembling a ‘bouquet of flowers’, where the six trimeric CRDs are close together (SP-A and MBP). For SP-D, higher oligomeric forms are also observed, creating so-called fuzzy balls, whereas the bovine collectin CL-43 has not been shown to form higher oligomeric structures than trimers (Fig. 1B). It should be stressed that, in vivo, a mixture of trimers and higher oligomeric structures is usually observed for collectins.

Figure 1.

 Structures of the collectins. (A) Graphical comparison of the primary structures of the collectins. Shown are the human collectins aligned with rMBP (i.e. the first collectin that was structurally characterized), the bovine-specific collectins, chicken SP-A (cSP-A) and pSP-D. Both cSP-A and pSP-D have unique structural features within their corresponding classes. Colored boxes indicate domains and symbols represent important structural elements/residues, as indicated in the key. The collectin sequences were represented based upon the National Center for Biotechnology Information/Genbank accession codes: AAA98781 (rMBP); CAB56124 (hMBP); 1111285A (hSP-A); AAK97540 (cSP-A); CAA46152 (hSP-D); AF132496 (pSP-D); BAA81747 (hCL-L1); BAF43301 (hCL-K1); P23805 (bCongl); P42916 (bCL-43); and AAM34743 (bCL-46). (B) Assembly of the collectins. The structural basis of all collectins is a trimer and, depending on the type of collectin, these trimers can assemble into different higher-order oligomers, as illustrated for several well-characterized collectins. *Number of amino acids of the mature polypeptide chain.

Currently, the collectin subfamily contains nine different members; these are MBP (also referred to in the literature as mannan-binding lectin; MBL); SP-A and SP-D; three relatively newly-discovered collectins, collectin kidney 1 (CL-K1), collectin liver 1 (CL-L1) and collectin placenta 1 (CL-P1); and three bovine-specific serum collectins, CL-43, CL-46 and conglutinin. CL-K1 and CL-L1 are also known as collectin-10 and collectin-11 because expression is not restricted to these organs. Similarly, CL-P1 is mostly referred to as scavenger receptor C-type lectin. It should be noted that it is doubtful whether CL-P1 truly belongs to the collectin subfamily [5]. CL-P1 is the only collectin that contains a type II transmembrane domain and, in evolutionary terms, CL-P1 falls within the family of asialoglycoprotein receptors. The structures of the CRDs of MBP and the surfactant proteins have been resolved and many of their biological functions have been determined. For the other collectins, assumptions on structure and function are often based on similarities with the better-studied family members.

Several reviews provide a more detailed description of functional aspects and the overall structures of collectins [6–10]. Accordingly, in this review, we focus on the specific characteristics of the carbohydrate recognition domain of collectins. The general features of C-type lectin domains of the whole CTLD family are discussed first, after which the specific characteristics of collectins are looked upon in more detail. In line with the available literature, we use the specific term CRD to describe the carbohydrate-binding C-type lectin domain of collectins, whereas CTLD is used when general features of this domain of all C-type lectin domain family members are described.

The C-type lectin domain superfamily

General structural aspects of the C-type lectin domain

The CTLD family contains a large group of animal proteins, of which the common feature is structural and not functional, although they were originally characterized as Ca2+-dependent saccharide-binding proteins. In other words, all family members contain the C-type lectin domain, although their role in vivo is very diverse. Indeed, many members of the CTLD family actually lack the saccharide-binding activity that was found in the initial members of the family. The CTLD family initially comprised seven groups but has now grown, with the addition of ten groups, to a total of 17. The classification into groups I to VII is based on the protein architecture and corresponds well to their functionality [11]. The ten extra groups added are purely based on the structure of the protein and functional data are largely lacking. Some of the better-known CTLD subfamilies besides collectins (group III) are the asialoglycoprotein receptors (group II), the selectins (group IV) and the natural killer cell receptors (group V) [12].

Structural characteristics of the CTLD fold

The CTLD is a very well conserved domain of 110–140 amino acids, which is characterized by an overall double lobe structure as is shown for SP-D in Fig. 2A. The first CTLD structure resolved was that of rat MBP (rMBP) [13]. Its CTLD contains five β-strands forming two β-sheets (β1 and β5; β2, β3 and β4), two α-helices and four loops, which are conserved in all other members of the CTLD family. The structure of rMBP-A is used as a template to describe other CTLD structures. However, some family members with the same overall shape contain additional small secondary structure elements in their domains. This causes some confusion when comparing structures of CTLDs because there does not appear to be general agreement on the numbering of these elements [12].

Figure 2.

 Structure of the CTLD. (A–D) Representations of the CTLD architecture as illustrated by ribbon diagrams of the crystal structure of hSP-D (coordinates are from the Protein Data Bank: accession no. 3G83). In addition to the CTLD, a small part of the neck α-helical region is shown in pink, three calcium ions as magenta spheres, a bound α-1,2 dimannose as a stick representation in yellow, and the C-terminus of the CTLD highlighted in dark blue (A, B, D). (A) The two lobes of the CTLD shown in green and red. (B) The long loop of the CTLD (green) and location of the two disulfide bridges (red). (C) Close-up of the carbohydrate-binding site showing interactions between key residues of the CTLD and the carbohydrate ligand with the lectin site Ca2+ ion. Key residues are shown in stick representation with carbons colored and residues numbered as in (E). Interactions with the Ca2+ ion are shown as eight dotted lines, which are colored corresponding to the residue/ligand involved. (D) The two binding-groove regions of the CTLD are depicted in green and the location of the EPN motif is indicated in red. (E) Amino acid sequence alignment of the C-terminal part (residues 150–221 of rMBP) of the CTLDs for all collectins shown in Fig. 1. Dots indicate residues in common with the rMBP sequence. Dashes indicate spaces inserted to maximize the identity across the alignment. A consensus sequence (con) with disulfide bridges (dotted lines) and a majority sequence (residues identical in six or more collectins) is shown below the alignment (maj). Asterisks indicate the residues necessary for coordination of Ca2+ ions and hydroxylgroups of oligosaccharides (colored as in C). Predicted N-glycosylated asparagine residues (shaded in pink), the two groove regions (shaded grey) and amino acid insertions in pSP-D and the three bovine collectins (bold) are indicated.

The CTLD structure is stabilized by four conserved cysteines that form two disulfide bridges (highlighted in Fig. 2B). The C1-C4 pair links the N- to the C- terminus of the fold, whereas C2-C3 is an essential part of the long loop (Fig. 2B). This long loop is involved in Ca2+ and saccharide binding and is present in all members of the collectin subfamily. Some CTLD subfamilies lack this loop (called compact CTLDs), although proteins of these subfamilies can still bind saccharides through a Ca2+-independent mechanism.

In a study by Zelensky and Gready [14], the structures of 37 C-type lectin domains available at that time were compared and it was found that several consensus sequences were required for the specific CTLD fold. These include three hydrophobic cores formed by aliphatic and/or aromatic amino acids found in all CTLDs, and a small hydrophobic core in the long loop (not present in compact CTLDs). The first hydrophobic core is formed by interaction of multiple amino acids in five distinct regions within the protein core, whereas the second core is formed by four aliphatic residues in the first α-helix and the second β-strand in the structure. The third hydrophobic core is formed from residues in the large loop and from connecting β-strands. The specific residue numbers of the 18 residues involved in these hydrophobic cores are provided by Zelensky and Gready [14]. The large loop also contains a highly conserved WIGL motif (corresponding to amino acids 156–159; FLGI motif in rMBP-A), which is part of all three hydrophobic cores and can be considered an integrating component of the CTLD structure.

The Ca2+- and carbohydrate-binding sites of CTLDs

Depending on the specific CTLD studied, up to four Ca2+-binding sites can be identified with different affinities for the metal ion (three of which are present in human SP-D and depicted in Fig. 2A). Depending on the crystallization conditions, all of these Ca2+-binding sites have been observed to occupy Ca2+ or other cations. Ca2+-binding sites 1, 3 and 4 are mainly considered to be involved in the stability of the domain. Removal of Ca2+ from these sites results in higher susceptibility to proteases and induces conformational changes [15–18]. Ca2+-binding site 2 is directly involved in binding of saccharides and is responsible for the Ca2+-dependency of ligand binding. For this reason, several crystallographic studies refer to this site as Ca2+-binding site 1, which creates some confusing inconsistency in the nomenclature used in literature.

C-type lectins can be divided into two groups based on their affinity for monosaccharides: mannose- and galactose-binding lectins. The calcium ion in binding site 2 forms six coordination bonds with carbonyls of highly-conserved specific amino acids in the CTLD and two coordination bonds with either water molecules when no substrate is present or with two hydroxyl groups of a bound monosaccharide (Fig. 2C). The first two residues are part of the characteristic EPN motif (amino acids 185–187 in rMBP-A) present in the long loop of mannose-binding CTLDs (highlighted in Fig. 2D). The carbonyl groups in the side chains of Glu and Asn directly bond with Ca2+, whereas the cis-proline forms the backbone kink required for this coordination. Three more coordination bonds are formed by the WND motif (amino acids 204–206) present in the β4 strand of the CTLD. The carbonyl side chain of Asn205 provides one coordination bond, whereas Asp206 interacts with Ca2+ with both its side chain and backbone carbonyl group. The carbonyl from the residue preceding cysteine C2 at the end of the long loop (Glu193 in rMBP-A) completes the coordination (Fig. 2C). Upon binding of saccharide, the two water molecules are replaced by the 3- and 4-hydroxyl groups of, for example, mannose. These hydroxyl groups also form hydrogen bonds with four (all but the Asp of the WND motif) carbonyl side chains that coordinate the calcium ion.

The EPN motif in the long loop of CTLDs is the main determinant for the specificity of monosaccharide binding. Mannose-binding C-type lectins all contain the EPN motif. The position of the carbonyls makes it bind most efficiently to a monosaccharide containing two equatorial hydroxyl groups at C3 and C4 of the monosaccharide, as is the case in mannose and glucose. Interaction with the 2- and 3- equatorial hydroxyl group of l-fucose is also favored. However, in C-type lectins with a preference for galactose, the EPN motif is changed into QPD. The carbonyls of Gln and Asp interact preferentially with an equatorial hydroxyl at C3 of the saccharide and an axial hydroxyl group at C4, as found in galactose. Although this prediction of saccharide specificity based solely on the EPN motif is a strong simplification and neglects other structural factors involved, it is still widely used and very often correct, although exceptions do exist [19]. Evidence for the utility of the EPN rule is provided in an elegant study by Drickamer [20], who was the first to show that a mutational change of the EPN motif into QPD completely changed the ligand specificity of MBP from mannose- to galactose-binding. Similar experiments using other C-type lectins have confirmed the reproducibility of this concept [21,22].


Structural aspects of the collectin carbohydrate recognition domain

When the sequences of the CRDs of the nine collectin members are compared (Fig. 2E), all characteristics for classical CTLDs are observed (Cys1 is missing in Fig. 2E because only the C-terminal sequence is shown for clarity). The structurally-important cysteines are conserved throughout all collectins and all five Ca2+-coordinating residues are present in the human and bovine collectins. All three hydrophobic cores are present, although identification is often based on sequence similarity instead of identity. Some deviations are found in the WIGL motif, which is less conserved. Rat MBP-A contains an FLGI motif, which still contains the characteristics of a WIGL motif but, in SP-D and the bovine collectins, the extremely well conserved glycine of the WIGL motif has been substituted for a serine. Because CL-43, CL-46 and conglutinin are considered to be derived from SP-D in bovidae [23], the grouping of these collectins is unlikely to be random. However, similar to compact C-type lectins that completely lack the long loop and thereby also the WIGL motif of CTLDs, the overall structure is not affected by this variation.

There are several other highly-conserved residues outside the hydrophobic cores. Most likely, they are functionally or structurally important, although the precise functionality of many of them is lacking. One exception is the C-terminal aromatic residue in the conserved CEF motif of collectins. Structural studies of SP-A and SP-D have shown that this residue is involved in inter- and intrachain bonding with neck residues, thereby reducing the positional flexibility of the CRD relative to the neck domain in trimeric collectin structures [24,25].

Ligand binding of collectins

Except for CL-P1, all of the described collectins belong to the mannose-type C-type lectins, indicating that they prefer binding ligands of the mannose type over the galactose type. However, SP-A, SP-D and MBL have been shown to have an even wider binding specificity because they can also bind nucleic acids [26,27], phospholipids [28] and nonglycosylated proteins [29], although these are not all mediated through the Ca2+-binding pocket of the CRD and will not be extensively discussed in the present review.

In several studies, the preference of individual collectins for simple monosaccharides has been determined, which provides a similar general picture for binding specificities with small differences. For example, in a direct comparison between conglutinin, MBP and SP-A, the latter had highest specificity for N-acetylmannosamine (over l-fucose, maltose, glucose and mannose), whereas the other two collectins bound most strongly to N-acetylglucosamine [30]. Despite the small differences, bound sugars were all of the mannose type and these collectins had very low or no affinity for galactose. However, other studies have also reported galactose as a strong binding ligand for SP-A [31,32], indicating that the methodology used to test binding can affect the determined binding specificities. Similar binding specificities were found for CL-43, CL-46 and CL-K1 [7,33]. The only galactose-type collectin, CL-P1, has a strong affinity for galactose-type saccharides, with highest affinity not only for LewisX trisaccharides, but also for galactose and N-acetylgalactosamine [5,34,35].

Despite the overall relative similarities in binding monosaccharides, it is clear that several additional structural factors are involved in binding more complex ligands as those encountered in vivo. Indeed, the three best studied collectins (MBP, SP-A and SP-D) have remarkably different binding characteristics for larger glyconjugates found on microorganisms or host cells. For example, MBP-C has a much higher affinity for high-mannose glycoconjugates than MBP-A, whereas their mannose-binding capacities are similar [36]. Crystallographic studies have now revealed at least some of the underlying mechanisms of glycan specificity and have shown the involvement of several residues in the binding pocket of collectins (see below). In addition, it should be noted that monomeric collectins show very weak binding activity. Cooperative binding through trimeric lectin domains is required for the effective binding of ligand. This binding is further enhanced through the multivalency of ligands, emphasizing that collectins bind patterns of glycoconjugates rather than individual monosaccharides [37,38].

Although the overall CRD structures of SP-A and SP-D are very similar, the actual trimeric structures including the neck regions of SP-A and SP-D differ significantly. The angle between neck and lectin domain in SP-A is close to 90%, creating a flat-surfaced trimer (T-shape; Fig. 3A), whereas a larger kink between neck and lectin domains in SP-D (and MBL) results in a more Y-shaped trimer [24] (Fig. 3B). This spatially positions ligand-binding sites differently in both collectins and affects the array of saccharides that it can bind to. In other words, the pattern of the oligosaccharide ligand (or of several ligands within a molecular pattern) has to fit with the spatial organization of the binding sites of the collectin. In addition, the surface of SP-A is significantly more hydrophobic than SP-D. These differences suggest that SP-D is more suited to bind polysaccharides, whereas SP-A binds less polar surfaces [38]. This difference is reflected in some of the natural ligands of SP-A and SP-D. Both bind lipopolysaccharide of Gram-negative bacteria. However, whereas SP-A binds the lipid A core of lipopolysaccharides, SP-D interacts with heptoses associated with the inner core [39], or with high-mannose O-polysaccharides [40]. In addition, being part of the surfactant lipid/protein mixture secreted in the lung, SP-A interacts with hydrophobic dipalmitoylphosphatidylcholine (DPPC) molecules, whereas SP-D interacts with phosphatidylinositol (PI) [41]. The ability of SP-D (but not SP-A) to bind PI is explained by the presence of the inositol carbohydrate group in this phospholipid. PI is bound in the Ca2+-binding site 2 and is essentially similar to saccharide binding to the lectin domain [42]. However, binding of SP-A to DPPC, which does not contain a carbohydrate moiety, is not mediated through the Ca2+-binding site. The exact binding site of DPPC is likely on the surface of the lectin domain [43,44]. Finally, interaction of surfactant collectins with ligands of Gram-positive bacteria is also different. SP-D can bind lipoteichoic acid and peptidoglycan in a Ca2+-dependent manner, whereas SP-A does not bind peptidoglycan and only weakly binds to lipoteichoic acid in a Ca2+-independent way [45,46].

Figure 3.

 Crystal structures of trimeric rSP-A and hSP-D. (A) Structure of the rSP-A trimer in complex with mannose (coordinates of the monomer are from the Protein Data Bank: accession no. 3PAK; the trimer was generated from crystallographic symmetry using coot [67]). Shown in two orientations (top and side view) with the CRDs in blue and the neck regions in red. The carbohydrate binding site Ca2+ ions are shown as magenta spheres and the carbohydrate ligand as yellow sticks. Location of the conserved N-glycosylation site in the CRD of SP-A is represented by green spheres. (B) Structure of the hSP-D trimer in complex with α-1,2 dimannose (coordinates are from the Protein Data Bank: accession no. 3G83). Details are shown as described for rSP-A in (A). Only mannose 1 of the dimannose ligand is shown. Green spheres represent the predicted location of the N-linked glycan, as present in pSP-D.

Specific residues involved in Ca2+ coordination and ligand binding

It is reasonable to assume that residues directly involved in Ca2+ and ligand binding are key determinants for ligand specificity of collectins. This is clear for the EPN motif because the EPN-QPD switch changes the binding specificity drastically from the mannose-type to the galactose-type. However, the residues involved in the direct binding of Ca2+ and ligand are highly conserved, and the small number of sequence changes that are described do not appear to have drastic effects. These sequence variations can be found not only in different types of collectins, but also between different animal species for a specific collectin. In SP-A, the third residue of the EPN motif is either arginine (rat, mouse, pig) or alanine (human, rhesus macaque; Fig. 2E). These motifs, which lack a side-chain carbonyl group on their third residue, would presumably be less capable of binding Ca2+ and ligand, although their ability to bind saccharides is not affected by this mutation. The crystal structure of rat SP-A indicates that the Ca2+-binding site is slightly altered compared to other C-type lectins in such a way that the side chain of arginine points away from the calcium ion. In the new orientation, the coordination bond with Ca2+ is created with the backbone carbonyl of arginine. This explains why naturally occurring SP-A variants with the motif EPA (human) or recombinantly produced SP-A with other residues such as His, Asn and Asp at position 3 of the EPN motif did not show differences in carbohydrate binding [30,47]. Another naturally occurring variant of the EPN motif is EPS, found in chicken in the SP-A homolog chicken lung lectin. In chicken lung lectin, this EPS triplet appears to restrict saccharide-binding to mannose [48]. Several other species have a (predicted) CL-L1 sequence (rat, mouse, rhesus macaque, dog) containing a similar EPS motif, although functional binding characteristics of these proteins are not available. Overall, it can be concluded that, despite the direct interactions of the residue on position 3 of the triplet with Ca2+ and ligands, variation is allowed at this position with only minor effects on glycan specificity of the CRD.

The WND motif is conserved in all mammalian collectins, except for canine SP-A where Asp is exchanged for Asn. This is unlikely to have a large effect on Ca2+ coordination because both amino acids contain side chain carbonyls. However, small differences are observed in ligand-binding specificities and the protease susceptibility of canine SP-A [32,49] but, without structural analyses, it remains unclear whether these differences can be attributed to this mutation. An interesting exception is found in chicken SP-A, where the WND motif is actually changed into WKD, incorporating a positively-charged amino acid in the Ca2+-binding site [50] (Fig. 2E). Although the effect of this mutation is unknown because no functional studies have been performed, an Asn→Ala mutation in the WND motif of rat SP-A completely inhibited binding of glycosylated surface proteins of Pneumocystis carinii [51]. The final Ca2+ coordinating residue, Glu173, is unchanged in all collectins.

Non-Ca2+-binding residues involved in ligand specificity

Considering that collectins bind to complex polysaccharides in vivo, observed differences in specificity for monosaccharides are only of limited value. The simplified categorization of collectins into mannose- or galactose-binding lectins based on the EPN motif is not sufficient to explain their binding to microorganisms and host cells. The specificity for larger ligands is mainly determined by residues close to the Ca2+-binding site that interact with the polysaccharide ligand. These residues are described to form a second binding pocket or a binding groove for the nonterminal saccharide residues of the ligand (Fig. 2D). A clear example is found in the differences in binding specificity of rMBP-A and MBP-C. Crystal structures of both MBP proteins have shown similar structures for their CRDs, and the monosaccharide-binding specificity is similar for both MBPs. However, differences were observed when oligomeric saccharides were used, demonstrating that MBP-C binds with higher affinity to trimannose and bivalent mannose-glycopeptides [52,53]. In addition, the binding of oligosaccharides was substantially different, with MBP-C binding best to the trimannosyl core of N-linked carbohydrates, whereas MBP-A preferentially bound to the terminal sugars [54]. The nature of this difference is found in the observation that the orientation of the bound mannose ring is reversed by 180° in MBP-C compared to MBP-A, which orients the rest of the saccharide differently in both collectins. Ng et al. [55] elegantly showed that a single residue in MBP-A (i.e. His189) influenced the orientation of bound ligand. Substitution of His189 to Val (its MBP-C equivalent) was sufficient to allow both binding orientations, thereby changing the specificity of MBP-A for oligosaccharides towards an MBP-C-like affinity profile. This example shows that the interaction of even one residue in the binding groove of the collectin with nonterminal residues of oligosaccharides can have a strong impact on ligand-binding specificity.

A similar effect caused by a single residue-change is observed in SP-D binding to the carbohydrate moiety of PI. Human SP-D has a reduced affinity for myo-inositol compared to rat or mouse SP-D, which could be attributed to a steric hindrance of ligand-binding of residue Arg343 (residue 207 in rMBP-A) in human SP-D. Substitution of this residue, which flanks the ligand-binding site, for lysine (comparable to the rat/mouse SP-D sequence), increased the affinity for inositol by five-fold, whereas the reverse mutation in rat SP-D had the opposite effect [56]. The same substitution resulted in a higher affinity for certain lipopolysaccharides [40] and also for mannosilated molecules of the mycobacterium membrane, showing that binding to whole organisms is also affected [42]. The steric hindrance of Arg343 in hSP-D was also observed in earlier studies where the Arg343Val mutant showed increased affinity for N-acetylglucosamine and glucose compared to the wild-type SP-D variant.

Other described effects of single amino acid changes in the CRD on ligand specificity include D325N (amino acid 187 in rMBP-A) in human SP-D that increases specificity for mannose over N-acetylmannosamine [57] and residues E333, E347, K348 and R349. These residues flank the carbohydrate-binding pocket in human SP-D [58] and all of these residues are considered to be part of a second binding pocket that interacts with nonterminal residues of glycans present on influenza A virus hemagglutinin (and probably other oligosaccharides). However, it is still unclear whether all these residues are involved in direct interaction with ligand or structurally affect other binding residues. Clearly, however, the interaction of parts of the lectin domain outside the main Ca2+-containing ligand-binding pocket has a large impact on binding specificity. More mutational and structural studies are required to obtain a full picture of the residues involved in ligand binding and their effect on glycan specificity for the whole protein.

Glycosylation of collectin CRD residues

In addition to the effect of specific residues in the CRD of collectins, there are two additional structural features observed in a small group of collectins that can have an effect on ligand binding. The first feature is N-glycosylation of the CRD (Fig. 1A). SP-A has a conserved N-glycosylation motif in all animal species known to date, resulting in attachment of a complex N-glycan at Asn187 (hSP-A numbering). This residue is located eight residues proximal to the EPN motif and at a reasonable distance to the carbohydrate-binding pocket (Fig. 3A). On the basis of structural analysis of nonglycosylated SP-A (recombinant protein with an N187S mutation), the glycosylation appears to point away from the actual binding pocket. However, considering the potential size of the carbohydrate group of 5–10 kDa, based on size-shift after deglycosylation of SP-A [59], it is hard to predict whether the carbohydrate moiety can affect ligand binding. No in-depth studies have been performed to determine the effect of the carbohydrate moiety of SP-A on binding carbohydrates, although one study indicated that there was no difference in Ca2+-dependent binding to glycosylated bacterial membrane proteins between glycosylated and nonglycosylated recombinant SP-A forms [51]. However, these SP-A molecules were expressed in insect cells and showed a smaller glycosyl-moiety (∼ 3 kDa) compared to native rat SP-A.

The only other collectin with a proven N-glycosylated CRD is porcine SP-D (pSP-D) (Fig. 2E). The unique Asn residue at position 303 (Q167 in rMBP), and not present in any other SP-D described so far, is glycosylated with a carbohydrate of ∼ 5 kDa [60] and is predicted to be located at the long loop and distal from the ligand-binding site as in SP-A (the location of the N-glycan in pSP-D is modeled into the hSP-D structure, as shown in Fig. 3B). Similar to SP-A, a possible effect of glycosylation on the carbohydrate-binding specificity of the CRD domain has not been well studied [59].

Finally, CL-L1 represents another collectin that could be glycosylated. No protein characterization has been performed yet on this collectin, although all CL-L1 predicted sequences have multiple potential glycosylation sites. The first glycosylation site is identical to SP-A, indicating that this modification could have a similar function for both SP-A and CL-L1. The second potential glycosylation site is found on the Asn of the WND motif. It remains to be determined whether this Asn residue is truly glycosylated in vivo, although it would obviously have a major impact on the ability of these collectins to bind Ca2+ and thereby to bind ligands if this residue is glycosylated.

Interestingly, the carbohydrate moieties on both SP-A and pSP-D have been shown to have an important function in the antiviral activity of these collectins. The interaction of collectins with influenza A viruses have been well studied. The terminal sialic acids of the N-linked glycans on SP-A and pSP-D are likely to interact with the sialic acid receptor present on the hemagglutinin of influenza virus [59,61]. In addition to this viral recognition of sialic acids, the binding pocket of SP-D can also bind to the carbohydrate structures on the viral hemagglutinin, creating a high-affinity dual mode of interaction between these two proteins. Because SP-A does not recognize the hemagglutinin carbohydrates, it is totally dependent on its glycosylation for interaction and neutralization of the virus. This is clearly shown in experiments with deglycosylated surfactant proteins, where SP-A completely lost its antiviral activity, whereas pSP-D retained part of its activity [59,62].

Insertion of extra loops in the CRD of collectins

The second feature to be discussed is the presence of an additional loop in the CRD in both bovine CL-43 and pSP-D. CL-43 contains an Arg-Ala-Lys (RAK) sequence immediately after the EPN motif. The observation that, in addition to CL-43, conglutinin and CL-46 also have enhanced anti-influenza A virus activity compared to human SP-D suggested that this loop, and possibly the two amino acids insertions present in conglutinin and CL-46, might be involved in enhanced binding to glycosylated viral proteins [63,64]. Mutational studies in which the RAK loop was inserted in human SP-D showed enhanced antiviral activity, although it also increased Ca2+-dependent binding to mannan and glycolipids derived from Mycobacterium tuberculosis [42,64]. This demonstrated that the insertion of this loop has direct effects on carbohydrate-binding properties. Interestingly, the insertion of a control loop consisting of three Ala residues in the human SP-D sequence had a similar enhancement of the mannan-binding effect [63]. This indicates that the effects are possibly not loop-residue specific but that binding of ligand with existing SP-D residues is affected by insertion of the loop.

A second collectin that contains an additional loop structure in this region is pSP-D. This loop has a Ser-Gly-Ala sequence and is positioned six residues distal to the EPN motif, placing it in close proximity to the carbohydrate binding site. In vitro studies with N-deglycosylated pSP-D (to rule out contributions from sialic acid-mediated interactions) have shown that N-deglycosylated pSP-D is by far the most effective SP-D species in neutralizing influenza A virus. It is speculated that the SGA-loop region is involved in generating enhanced binding properties of the CRD of pSP-D to viral glycans. In addition, site-directed mutagenesis studies with recombinant human SP-D have also indicated that this porcine-specific SGA-loop is likely to be involved in mediating the antiviral activity of the N-linked glycan in the CRD of pSP-D [65].


Collectins represent a unique protein group that is involved in innate immunity. The main structural characteristic of collectins is that the functional unit is a trimer. This enables the protein to bind to saccharide patterns present on microorganisms instead of single saccharide residues. The structural organization within a trimer (i.e. the angle between neck and CRD) or the organization of multiple trimers within a larger oligomeric form of the protein (cruciform or ‘bouquet of flowers’ structure) is likely a crucial determinant for the in vivo ligand binding characteristics of these collectins.

Structural data of crystallized collectins with simple saccharide molecules has provided much information on the involvement of specific residues within the CRD fold in ligand binding. In some cases, differences in ligand specificity can be attributed to specific residues that vary among collectins. More importantly, the structural studies have shown that binding experiments with monosaccharides are of very limited value because the interaction between collectin and ligand extends beyond the terminal saccharide residue of the ligand. Therefore, a simple ranking of monosaccharide specificity does not provide many clues for the in vivo binding characteristics of collectins. The use of glycan arrays to test more complicated sugars is a step forward towards an understanding of the complex interactions between collectins and ligands. However, the real future challenge lies in determining the collective interaction of collectin trimers with in vivo ligands (i.e. polysaccharides found on microorganisms or host cells). It will be challenging, although essential, to obtain structural data on the collectin oligomers complexed with these larger polysaccharide molecules. Only then can the contribution of the sequence variations in individual CRDs and the spatial arrangement of CRDs within the protein be evaluated.

On a further note, it will be interesting to study collectins other than MBP, SP-A and SP-D in more detail. The collectins CL-L1 and CL-K1 and the bovine collectins are relatively understudied, and these could have important new ligand binding characteristics that are not observed in the better studied collectins. However, there is also a need to look more closely at species differences. For example, chickens lack SP-D but appear to contain two SP-A homologs that largely lack the collectin domain [50,66]. It will be interesting to relate these differences to the functionality of the innate immune system in birds. Finally, the example of pSP-D shows that species differences within the existing collectin members can have a large effect on the functionality of the protein. The extra glycosylation site and an extra loop structure enable pSP-D to neutralize influenza A virus much more effectively than human SP-D. These observations show that additional structural features could provide important clues about the full potential of collectins in innate immunity.


This work was supported by a Program Grant (RGP0016/2009-C) of the Human Frontier Science Program (HFSP). The authors thank Michael Rynkiewicz for assistance with the coot software.