Structural aspects of the collectin carbohydrate recognition domain
When the sequences of the CRDs of the nine collectin members are compared (Fig. 2E), all characteristics for classical CTLDs are observed (Cys1 is missing in Fig. 2E because only the C-terminal sequence is shown for clarity). The structurally-important cysteines are conserved throughout all collectins and all five Ca2+-coordinating residues are present in the human and bovine collectins. All three hydrophobic cores are present, although identification is often based on sequence similarity instead of identity. Some deviations are found in the WIGL motif, which is less conserved. Rat MBP-A contains an FLGI motif, which still contains the characteristics of a WIGL motif but, in SP-D and the bovine collectins, the extremely well conserved glycine of the WIGL motif has been substituted for a serine. Because CL-43, CL-46 and conglutinin are considered to be derived from SP-D in bovidae , the grouping of these collectins is unlikely to be random. However, similar to compact C-type lectins that completely lack the long loop and thereby also the WIGL motif of CTLDs, the overall structure is not affected by this variation.
There are several other highly-conserved residues outside the hydrophobic cores. Most likely, they are functionally or structurally important, although the precise functionality of many of them is lacking. One exception is the C-terminal aromatic residue in the conserved CEF motif of collectins. Structural studies of SP-A and SP-D have shown that this residue is involved in inter- and intrachain bonding with neck residues, thereby reducing the positional flexibility of the CRD relative to the neck domain in trimeric collectin structures [24,25].
Ligand binding of collectins
Except for CL-P1, all of the described collectins belong to the mannose-type C-type lectins, indicating that they prefer binding ligands of the mannose type over the galactose type. However, SP-A, SP-D and MBL have been shown to have an even wider binding specificity because they can also bind nucleic acids [26,27], phospholipids  and nonglycosylated proteins , although these are not all mediated through the Ca2+-binding pocket of the CRD and will not be extensively discussed in the present review.
In several studies, the preference of individual collectins for simple monosaccharides has been determined, which provides a similar general picture for binding specificities with small differences. For example, in a direct comparison between conglutinin, MBP and SP-A, the latter had highest specificity for N-acetylmannosamine (over l-fucose, maltose, glucose and mannose), whereas the other two collectins bound most strongly to N-acetylglucosamine . Despite the small differences, bound sugars were all of the mannose type and these collectins had very low or no affinity for galactose. However, other studies have also reported galactose as a strong binding ligand for SP-A [31,32], indicating that the methodology used to test binding can affect the determined binding specificities. Similar binding specificities were found for CL-43, CL-46 and CL-K1 [7,33]. The only galactose-type collectin, CL-P1, has a strong affinity for galactose-type saccharides, with highest affinity not only for LewisX trisaccharides, but also for galactose and N-acetylgalactosamine [5,34,35].
Despite the overall relative similarities in binding monosaccharides, it is clear that several additional structural factors are involved in binding more complex ligands as those encountered in vivo. Indeed, the three best studied collectins (MBP, SP-A and SP-D) have remarkably different binding characteristics for larger glyconjugates found on microorganisms or host cells. For example, MBP-C has a much higher affinity for high-mannose glycoconjugates than MBP-A, whereas their mannose-binding capacities are similar . Crystallographic studies have now revealed at least some of the underlying mechanisms of glycan specificity and have shown the involvement of several residues in the binding pocket of collectins (see below). In addition, it should be noted that monomeric collectins show very weak binding activity. Cooperative binding through trimeric lectin domains is required for the effective binding of ligand. This binding is further enhanced through the multivalency of ligands, emphasizing that collectins bind patterns of glycoconjugates rather than individual monosaccharides [37,38].
Although the overall CRD structures of SP-A and SP-D are very similar, the actual trimeric structures including the neck regions of SP-A and SP-D differ significantly. The angle between neck and lectin domain in SP-A is close to 90%, creating a flat-surfaced trimer (T-shape; Fig. 3A), whereas a larger kink between neck and lectin domains in SP-D (and MBL) results in a more Y-shaped trimer  (Fig. 3B). This spatially positions ligand-binding sites differently in both collectins and affects the array of saccharides that it can bind to. In other words, the pattern of the oligosaccharide ligand (or of several ligands within a molecular pattern) has to fit with the spatial organization of the binding sites of the collectin. In addition, the surface of SP-A is significantly more hydrophobic than SP-D. These differences suggest that SP-D is more suited to bind polysaccharides, whereas SP-A binds less polar surfaces . This difference is reflected in some of the natural ligands of SP-A and SP-D. Both bind lipopolysaccharide of Gram-negative bacteria. However, whereas SP-A binds the lipid A core of lipopolysaccharides, SP-D interacts with heptoses associated with the inner core , or with high-mannose O-polysaccharides . In addition, being part of the surfactant lipid/protein mixture secreted in the lung, SP-A interacts with hydrophobic dipalmitoylphosphatidylcholine (DPPC) molecules, whereas SP-D interacts with phosphatidylinositol (PI) . The ability of SP-D (but not SP-A) to bind PI is explained by the presence of the inositol carbohydrate group in this phospholipid. PI is bound in the Ca2+-binding site 2 and is essentially similar to saccharide binding to the lectin domain . However, binding of SP-A to DPPC, which does not contain a carbohydrate moiety, is not mediated through the Ca2+-binding site. The exact binding site of DPPC is likely on the surface of the lectin domain [43,44]. Finally, interaction of surfactant collectins with ligands of Gram-positive bacteria is also different. SP-D can bind lipoteichoic acid and peptidoglycan in a Ca2+-dependent manner, whereas SP-A does not bind peptidoglycan and only weakly binds to lipoteichoic acid in a Ca2+-independent way [45,46].
Figure 3. Crystal structures of trimeric rSP-A and hSP-D. (A) Structure of the rSP-A trimer in complex with mannose (coordinates of the monomer are from the Protein Data Bank: accession no. 3PAK; the trimer was generated from crystallographic symmetry using coot ). Shown in two orientations (top and side view) with the CRDs in blue and the neck regions in red. The carbohydrate binding site Ca2+ ions are shown as magenta spheres and the carbohydrate ligand as yellow sticks. Location of the conserved N-glycosylation site in the CRD of SP-A is represented by green spheres. (B) Structure of the hSP-D trimer in complex with α-1,2 dimannose (coordinates are from the Protein Data Bank: accession no. 3G83). Details are shown as described for rSP-A in (A). Only mannose 1 of the dimannose ligand is shown. Green spheres represent the predicted location of the N-linked glycan, as present in pSP-D.
Download figure to PowerPoint
Specific residues involved in Ca2+ coordination and ligand binding
It is reasonable to assume that residues directly involved in Ca2+ and ligand binding are key determinants for ligand specificity of collectins. This is clear for the EPN motif because the EPN-QPD switch changes the binding specificity drastically from the mannose-type to the galactose-type. However, the residues involved in the direct binding of Ca2+ and ligand are highly conserved, and the small number of sequence changes that are described do not appear to have drastic effects. These sequence variations can be found not only in different types of collectins, but also between different animal species for a specific collectin. In SP-A, the third residue of the EPN motif is either arginine (rat, mouse, pig) or alanine (human, rhesus macaque; Fig. 2E). These motifs, which lack a side-chain carbonyl group on their third residue, would presumably be less capable of binding Ca2+ and ligand, although their ability to bind saccharides is not affected by this mutation. The crystal structure of rat SP-A indicates that the Ca2+-binding site is slightly altered compared to other C-type lectins in such a way that the side chain of arginine points away from the calcium ion. In the new orientation, the coordination bond with Ca2+ is created with the backbone carbonyl of arginine. This explains why naturally occurring SP-A variants with the motif EPA (human) or recombinantly produced SP-A with other residues such as His, Asn and Asp at position 3 of the EPN motif did not show differences in carbohydrate binding [30,47]. Another naturally occurring variant of the EPN motif is EPS, found in chicken in the SP-A homolog chicken lung lectin. In chicken lung lectin, this EPS triplet appears to restrict saccharide-binding to mannose . Several other species have a (predicted) CL-L1 sequence (rat, mouse, rhesus macaque, dog) containing a similar EPS motif, although functional binding characteristics of these proteins are not available. Overall, it can be concluded that, despite the direct interactions of the residue on position 3 of the triplet with Ca2+ and ligands, variation is allowed at this position with only minor effects on glycan specificity of the CRD.
The WND motif is conserved in all mammalian collectins, except for canine SP-A where Asp is exchanged for Asn. This is unlikely to have a large effect on Ca2+ coordination because both amino acids contain side chain carbonyls. However, small differences are observed in ligand-binding specificities and the protease susceptibility of canine SP-A [32,49] but, without structural analyses, it remains unclear whether these differences can be attributed to this mutation. An interesting exception is found in chicken SP-A, where the WND motif is actually changed into WKD, incorporating a positively-charged amino acid in the Ca2+-binding site  (Fig. 2E). Although the effect of this mutation is unknown because no functional studies have been performed, an AsnAla mutation in the WND motif of rat SP-A completely inhibited binding of glycosylated surface proteins of Pneumocystis carinii . The final Ca2+ coordinating residue, Glu173, is unchanged in all collectins.
Non-Ca2+-binding residues involved in ligand specificity
Considering that collectins bind to complex polysaccharides in vivo, observed differences in specificity for monosaccharides are only of limited value. The simplified categorization of collectins into mannose- or galactose-binding lectins based on the EPN motif is not sufficient to explain their binding to microorganisms and host cells. The specificity for larger ligands is mainly determined by residues close to the Ca2+-binding site that interact with the polysaccharide ligand. These residues are described to form a second binding pocket or a binding groove for the nonterminal saccharide residues of the ligand (Fig. 2D). A clear example is found in the differences in binding specificity of rMBP-A and MBP-C. Crystal structures of both MBP proteins have shown similar structures for their CRDs, and the monosaccharide-binding specificity is similar for both MBPs. However, differences were observed when oligomeric saccharides were used, demonstrating that MBP-C binds with higher affinity to trimannose and bivalent mannose-glycopeptides [52,53]. In addition, the binding of oligosaccharides was substantially different, with MBP-C binding best to the trimannosyl core of N-linked carbohydrates, whereas MBP-A preferentially bound to the terminal sugars . The nature of this difference is found in the observation that the orientation of the bound mannose ring is reversed by 180° in MBP-C compared to MBP-A, which orients the rest of the saccharide differently in both collectins. Ng et al.  elegantly showed that a single residue in MBP-A (i.e. His189) influenced the orientation of bound ligand. Substitution of His189 to Val (its MBP-C equivalent) was sufficient to allow both binding orientations, thereby changing the specificity of MBP-A for oligosaccharides towards an MBP-C-like affinity profile. This example shows that the interaction of even one residue in the binding groove of the collectin with nonterminal residues of oligosaccharides can have a strong impact on ligand-binding specificity.
A similar effect caused by a single residue-change is observed in SP-D binding to the carbohydrate moiety of PI. Human SP-D has a reduced affinity for myo-inositol compared to rat or mouse SP-D, which could be attributed to a steric hindrance of ligand-binding of residue Arg343 (residue 207 in rMBP-A) in human SP-D. Substitution of this residue, which flanks the ligand-binding site, for lysine (comparable to the rat/mouse SP-D sequence), increased the affinity for inositol by five-fold, whereas the reverse mutation in rat SP-D had the opposite effect . The same substitution resulted in a higher affinity for certain lipopolysaccharides  and also for mannosilated molecules of the mycobacterium membrane, showing that binding to whole organisms is also affected . The steric hindrance of Arg343 in hSP-D was also observed in earlier studies where the Arg343Val mutant showed increased affinity for N-acetylglucosamine and glucose compared to the wild-type SP-D variant.
Other described effects of single amino acid changes in the CRD on ligand specificity include D325N (amino acid 187 in rMBP-A) in human SP-D that increases specificity for mannose over N-acetylmannosamine  and residues E333, E347, K348 and R349. These residues flank the carbohydrate-binding pocket in human SP-D  and all of these residues are considered to be part of a second binding pocket that interacts with nonterminal residues of glycans present on influenza A virus hemagglutinin (and probably other oligosaccharides). However, it is still unclear whether all these residues are involved in direct interaction with ligand or structurally affect other binding residues. Clearly, however, the interaction of parts of the lectin domain outside the main Ca2+-containing ligand-binding pocket has a large impact on binding specificity. More mutational and structural studies are required to obtain a full picture of the residues involved in ligand binding and their effect on glycan specificity for the whole protein.
Glycosylation of collectin CRD residues
In addition to the effect of specific residues in the CRD of collectins, there are two additional structural features observed in a small group of collectins that can have an effect on ligand binding. The first feature is N-glycosylation of the CRD (Fig. 1A). SP-A has a conserved N-glycosylation motif in all animal species known to date, resulting in attachment of a complex N-glycan at Asn187 (hSP-A numbering). This residue is located eight residues proximal to the EPN motif and at a reasonable distance to the carbohydrate-binding pocket (Fig. 3A). On the basis of structural analysis of nonglycosylated SP-A (recombinant protein with an N187S mutation), the glycosylation appears to point away from the actual binding pocket. However, considering the potential size of the carbohydrate group of 5–10 kDa, based on size-shift after deglycosylation of SP-A , it is hard to predict whether the carbohydrate moiety can affect ligand binding. No in-depth studies have been performed to determine the effect of the carbohydrate moiety of SP-A on binding carbohydrates, although one study indicated that there was no difference in Ca2+-dependent binding to glycosylated bacterial membrane proteins between glycosylated and nonglycosylated recombinant SP-A forms . However, these SP-A molecules were expressed in insect cells and showed a smaller glycosyl-moiety (∼ 3 kDa) compared to native rat SP-A.
The only other collectin with a proven N-glycosylated CRD is porcine SP-D (pSP-D) (Fig. 2E). The unique Asn residue at position 303 (Q167 in rMBP), and not present in any other SP-D described so far, is glycosylated with a carbohydrate of ∼ 5 kDa  and is predicted to be located at the long loop and distal from the ligand-binding site as in SP-A (the location of the N-glycan in pSP-D is modeled into the hSP-D structure, as shown in Fig. 3B). Similar to SP-A, a possible effect of glycosylation on the carbohydrate-binding specificity of the CRD domain has not been well studied .
Finally, CL-L1 represents another collectin that could be glycosylated. No protein characterization has been performed yet on this collectin, although all CL-L1 predicted sequences have multiple potential glycosylation sites. The first glycosylation site is identical to SP-A, indicating that this modification could have a similar function for both SP-A and CL-L1. The second potential glycosylation site is found on the Asn of the WND motif. It remains to be determined whether this Asn residue is truly glycosylated in vivo, although it would obviously have a major impact on the ability of these collectins to bind Ca2+ and thereby to bind ligands if this residue is glycosylated.
Interestingly, the carbohydrate moieties on both SP-A and pSP-D have been shown to have an important function in the antiviral activity of these collectins. The interaction of collectins with influenza A viruses have been well studied. The terminal sialic acids of the N-linked glycans on SP-A and pSP-D are likely to interact with the sialic acid receptor present on the hemagglutinin of influenza virus [59,61]. In addition to this viral recognition of sialic acids, the binding pocket of SP-D can also bind to the carbohydrate structures on the viral hemagglutinin, creating a high-affinity dual mode of interaction between these two proteins. Because SP-A does not recognize the hemagglutinin carbohydrates, it is totally dependent on its glycosylation for interaction and neutralization of the virus. This is clearly shown in experiments with deglycosylated surfactant proteins, where SP-A completely lost its antiviral activity, whereas pSP-D retained part of its activity [59,62].
Insertion of extra loops in the CRD of collectins
The second feature to be discussed is the presence of an additional loop in the CRD in both bovine CL-43 and pSP-D. CL-43 contains an Arg-Ala-Lys (RAK) sequence immediately after the EPN motif. The observation that, in addition to CL-43, conglutinin and CL-46 also have enhanced anti-influenza A virus activity compared to human SP-D suggested that this loop, and possibly the two amino acids insertions present in conglutinin and CL-46, might be involved in enhanced binding to glycosylated viral proteins [63,64]. Mutational studies in which the RAK loop was inserted in human SP-D showed enhanced antiviral activity, although it also increased Ca2+-dependent binding to mannan and glycolipids derived from Mycobacterium tuberculosis [42,64]. This demonstrated that the insertion of this loop has direct effects on carbohydrate-binding properties. Interestingly, the insertion of a control loop consisting of three Ala residues in the human SP-D sequence had a similar enhancement of the mannan-binding effect . This indicates that the effects are possibly not loop-residue specific but that binding of ligand with existing SP-D residues is affected by insertion of the loop.
A second collectin that contains an additional loop structure in this region is pSP-D. This loop has a Ser-Gly-Ala sequence and is positioned six residues distal to the EPN motif, placing it in close proximity to the carbohydrate binding site. In vitro studies with N-deglycosylated pSP-D (to rule out contributions from sialic acid-mediated interactions) have shown that N-deglycosylated pSP-D is by far the most effective SP-D species in neutralizing influenza A virus. It is speculated that the SGA-loop region is involved in generating enhanced binding properties of the CRD of pSP-D to viral glycans. In addition, site-directed mutagenesis studies with recombinant human SP-D have also indicated that this porcine-specific SGA-loop is likely to be involved in mediating the antiviral activity of the N-linked glycan in the CRD of pSP-D .