Structural analysis of the Rhizoctonia solani agglutinin reveals a domain-swapping dimeric assembly

Authors


Correspondence

D. D. Leonidas, Department of Biochemistry and Biotechnology, University of Thessaly, 26 Ploutonos Str., 41221 Larissa, Greece

Fax: +30 2410565290

Tel.: +30 2410565278

E-mail: ddleonidas@bio.uth.gr

E. J. M. Van Damme, Department of Molecular Biotechnology, Ghent University, Coupure Links 653, B-9000, Ghent, Belgium

Fax: +32 92646219

Tel.: +32 92646086

E-mail: elsjm.vandamme@ugent.be

Abstract

Rhizoctonia solani agglutinin (RSA) is a 15.5-kDa lectin accumulated in the mycelium and sclerotia of the soil born plant pathogenic fungus R. solani. Although it is considered to serve as a storage protein and is implicated in fungal insecticidal activity, its physiological role remains unclear as a result of a lack of any structure/function relationship information. Glycan arrays showed that RSA displays high selectivity towards terminal nonreducing N-acetylgalactosamine residues. We determined the amino acid sequence of RSA and also determined the crystal structures of the free form and the RSA–N-acetylgalactosamine complex at 1.6 and 2.2 Å resolution, respectively. RSA is a homodimer comprised of two monomers adopting the β-trefoil fold. Each monomer accommodates two different carbohydrate-binding sites in an asymmetric way. Despite RSA topology similarities with R-type lectins, the two-monomer assembly involves an N-terminal swap, thus creating a dimer association novel to R-type lectins. Structural characterization of the two carbohydrate-binding sites offers insights on the structural determinants of the RSA carbohydrate specificity.

Database

Structural data have been deposited in the Protein Data Bank database under accession numbers 4G9M and 4G9N.

Structured digital abstract

RSA and RSA bind by x-ray crystallography (View interaction)

Abbreviations
CNL

Clitocybe nebularis lectin

EW29

earthworm 29-kDa lectin

GalNAc

N-acetylgalactosamine

RSA

Rhizoctonia solani agglutinin

SSA

Sclerotinia sclerotorium agglutinin

Introduction

Lectins are mono- or multivalent carbohydrate-binding, (glyco)proteins that have no enzymic activity towards their carbohydrate ligand [1]. Basidiomycetes constitute a rich source of fungal lectins exhibiting various carbohydrate-binding properties and structures [2]. Although fungal lectins have a role as storage proteins [2, 3] and are also involved in morphogenesis and development of the fungi, their physiological importance remains enigmatic [4-6]. In 1987, Vranken et al. [7] first reported the purification and characterization of a lectin from Rhizoctonia solani (RSA). RSA is a homodimer of two noncovalently associated monomers of 15.5 kDa [7]. A more detailed analysis of the protein by ES-MS revealed a molecular mass of 15 547 Da for the RSA monomer. The protein is not glycosylated [8]. The lectin was detected in R. solani strains of many different anastomosis groups [2, 3, 9]. Furthermore, a detailed analysis of the accumulation and distribution of the RSA in the mycelium and the sclerotia revealed that the lectin is developmentally regulated [4]. The high concentration of RSA in the sclerotia, as well as its developmental regulation, led to the hypothesis that the lectin serves as a storage protein in the resting structures of this fungus. Recently, RSA was also shown to possess insecticidal activity both for caterpillars and sucking insects [5]. Feeding of larvae of the cotton leafworm Spodoptera littoralis on a diet containing 10 mg·g−1 RSA for 11 days lead to a weight reduction of 89%, with a high mortality rate of 82%. Furthermore, rearing of pea aphid, Acyrthosiphon pisum, on a diet containing different concentrations of RSA for 3 days revealed strong toxicity towards this aphid, with an LC50 of 9 μm RSA [10]. Treatment of both insects with fluorescein isothiocyanate-labelled lectin demonstrated that RSA bound to the apical/luminal side of the midgut epithelium but was not taken up in the midgut cells.

At present, it is not clear how the lectin exerts its biological activity. One of the main reasons for the apparent lack of understanding of the in vivo function of this fungal lectin is the fact that no sequence information is available and little is known about the structure/function relationships for this protein. Edman degradation of the RSA sequence allowed the determination of the 60 N-terminal amino acids of the protein. Detailed homology searches provided evidence that RSA is structurally and evolutionary related to the superfamily of R-type lectins [8]. The main characteristic of these lectin domains is that they consist of three repeated subdomains, referred to as α, β and γ repeats, each containing a more or less well conserved QXW motif [11]. In the present study, we report the molecular cloning of the full RSA sequence. Cloning and complete sequencing of the lectin cDNA was possible as a result of taking advantage of the sequence information made available through genome sequencing of R. solani anastomosis group 3 (AG-3). We also report the crystal structures of the free RSA at high resolution (1.6 Å) and in complex with N-acetylgalactosamine (GalNAc) at 2.2 Å resolution. The crystal structure reveals that the RSA structure has a β-trefoil motif, as previously observed in other fungal lectins [6, 12-14], although with an unusual dimer association. The RSA–GalNAc complex structure reveals two distinct carbohydrate-binding sites, and elicits the structural determinants of the RSA carbohydrate-binding specificity.

Results and Discussion

Molecular cloning of RSA

A blast search of the draft sequence of the R. solani AG3 genome allowed the retrieval of a 3.2-kbp genomic fragment with high sequence homology (79% sequence identity, 88% sequence similarity) to the N-terminal amino acid sequence of RSA. This sequence is preceded by a sequence containing the typical TATA and CAAT boxes characteristic of a promoter sequence (Fig. 1).

Figure 1.

RSA sequence deduced from a genomic DNA fragment of R. solani anastomosis AG-3. CAAT and TATA boxes are shaded grey. Intron sequences are underlined. The start and stop codons of the RSA sequence are boxed. The numbers on the left refer to nucleotide positions on scaffold scf1119142671166.

A more detailed comparison of the translated genomic sequence with the N-terminal sequence of RSA and some sequences encoding other R-type lectins allowed the prediction of the complete coding sequence for RSA. The complete nucleotide sequence for RSA is 582 nucleotides and contains three introns with typical GC/AG boundaries. This sequence encoding RSA does not contain any signal peptide (Fig. 1). The amino acid composition of the translated sequences shows high similarity to the amino acid composition for the purified R. solani lectin [9].

The predicted RSA sequence was validated at the cDNA level. Therefore, RNA was purified from different R. solani strains and transcribed into cDNA. PCR was performed using nested primers complementary to the 3′ and 5′ genomic sequence. Sequence analysis of the PCR amplified fragments confirmed the RSA sequence of 142 amino acids (encoding a polypeptide of approximately 15.4 kDa), with minor differences among the cDNA fragments amplified for different R. solani anastomosis groups (Fig. 2).

Figure 2.

Multiple sequence alignment of the N-terminal sequence for RSA, the deduced amino acid sequences for genomic and cDNA fragments encoding RSA, and the RSA sequence from X-ray structure. Identical residues are shaded in black, whereas similar residues are shown in grey.

Carbohydrate-binding specificity of RSA

To determine its fine carbohydrate-binding specificity, RSA was incubated with glycan arrays (Table 1). Different RSA concentrations were analyzed on the array and indicated that the lectin shows high selectivity towards terminal nonreducing GalNAc residues either in α or β configuration. No concentration-dependent binding of RSA to the array was observed. Both single GalNAc residues, as well as the disaccharides GalNAcα1-3GalNAc, GalNAcβ1-4GlcNAc (diacetyllactosediamine) and GalNAcα1-3Gal, showed the highest reactivity with RSA. In addition, the lectin also recognizes the blood group A determinant GalNAcα1-3(Fucα1-2)Galβ1-4GlcNAc. The reactivity of the lectin towards more complex N-glycans is strongly reduced.

Table 1. Glycan array binding of RSA at 0.5 μg·mL−1. An overview is provided of the top 20 glycan structures with highest reactivity on the printed array version 3.1. The GalNAc motif is shown in bold
Glycan numberaGlycan structure: spaceraRFUbCVc
  1. a A complete list of all glycans and spacer arms is available at http://www.functionalglycomics.org. b The data are reported as mean relative fluorescence units (RFU) of six replicates for each glycan presented on the array after removing the highest and lowest values. c Percentage of coefficient of variation determined as SD/mean × 100.

83GalNAcα1-3GalNAcβ–Sp858 0320
90GalNAcβ1-4GlcNAcβ–Sp046 6296
9α-GalNAc–Sp844 2841
84GalNAcα1-3Galβ–Sp844 0670
19β-GalNAc–Sp843 7535
299GalNAcα1-3(Fucα1-2)Galβ–Sp1842 1412
80GalNAcα1-3(Fucα1-2)Galβ1-4GlcNAcβ–Sp834 4439
366GalNAcα1-3(Fucα1-2)Galβ1-4GlcNAcβ1-2Manα1-3[GalNAcα1-3(Fucα1-2)Galβ1-4GlcNAcβ1-2Manα1-6]Manβ1-4GlcNAcβ1-4GlcNAcβ-Sp2033 7629
91GalNAcβ1-4GlcNAcβ–Sp832 65267
329GalNAcβ1-3Galα1-4Galβ1-4GlcNAcβ1-3Galβ1-4Glcβ-Sp029 4154
300GalNAcβ1-3Galβ-Sp822 77920
85GalNAcα1-4(Fucα1-2)Galβ1-4GlcNAcβ-Sp821 60494
79GalNAcα1-3(Fucα1-2)Galβ1-4GlcNAcβ-Sp019 54517
373NeuAcα2-6Galβ1-4GlcNAcβ1-3GalNAc-Sp1419 09915
88GalNAcβ1-3Galα1-4Galβ1-4GlcNAcβ-Sp017 3718
374NeuAcα2-3[Fucα1-3]Galβ1-4GlcNAcβ1-3GalNAc-Sp14722531
319Neu5Acα2-6Galβ1-4GlcNAcβ1-2Manα1-3(Neu5Acα2-6Galβ1-4GlcNAcβ1-2Manα1-6)Manβ1-4GlcNAcβ1-4GlcNAcβ-N(LT)AVL719739
77GalNAcα1-3(Fucα1-2)Galβ1-3GlcNAcβ-Sp071753
331GalNAcα1-3(Fucα1-2)Galβ1-4GlcNAcβ1-3Galβ1-4GlcNAcβ1-3Galβ1-4GlcNAcβ-Sp068209
286[6OSO3]Galβ1-4[6OSO3]GlcNAcβ-Sp0656438

Description of the overall structure

The structures of free RSA and in complex with GalNAc were determined at 1.6 and 2.2 Å resolution, respectively. RSA crystallizes as a homodimer and 142 (mol A) and 143 (mol B) of its residues are well defined within the electron density map. RSA is a compact molecule, with the exception of the eight N-terminal residues that form a loop protruding from the main structure, with approximate monomer dimensions of 30 × 36 × 32 Å (Fig. 3A). The two monomers have almost identical structures, with r.m.s.d. of 0.31 Å (Cα atoms), 0.38 Å (main-chain) and 0.94 Å (all atoms). In the native structure, the dimer contains 247 bound water molecules, whereas 15 residues in mol A and three residues in mol B of the RSA dimer display alternative conformations for their side chains. Each RSA molecule is primarily comprised of antiparallel β strands (β1–β12, β2–β3, β4–β5, β6–β7, β8–β9, β10–β11) connected by large loops and four small 3/10 helical regions (residues 25–27, 42–45, 115–117 and 132–135) forming an up and down structural motif (Fig. 3A). The RSA topology reveals similarities with the ricin-like or R-type lectin family [15]. Each molecule can be divided into three subdomains α (β1–β4), β (β5–β8) and γ (β9–β12) assembling around a pseudo three-fold axis in a three-lobed organization, adopting the well described β-trefoil fold for lectins (Fig. 3A) [16]. Structural superposition of the three subdomains reveals that their secondary structure is very similar, with r.m.s.d. values in the range 0.6–1.6 Å for their Cα atoms. The characteristic R-type lectin sequence motif (QXW)3 [11] is conserved for subdomains α and γ in the form of DNQKW repeats (Fig. 3B). Each repeat coincides with the 3/10 helices preceding strands β4 and β11, respectively, whereas there are no disulfide bonds to stabilize the β-trefoil fold.

Figure 3.

(A) Schematic presentation of the RSA dimer. Strands are labelled β1 to β12, whereas the C- and N- termini are also labelled. The two monomers are shown: molecule A on the right and molecule B on the left. Each molecule is divided into three subdomains: α (magenta), β (cyan) and γ (pale yellow). The three GalNAc molecules bound (depicted as ball and stick models) in domains α (A) and γ (B) of molecule A and in domain α (C) of molecule B are shown with the corresponding electron density map. (B) Structure-based sequence alignment of the three individual subdomains of RSA; residues involved in carbohydrate binding at subdomain α and γ are marked with an asterisk. Strands and helices are represented as arrows and spirals, respectively. (C) Schematic representations of the CNL and SSA dimer association.

A domain-swapping dimeric assembly

The two RSA monomers are related by a noncrystallographic two-fold symmetry (spherical polar angles, ω = 39.9°, φ = −14.1° and χ = 179.5°). Thus, the RSA dimer is formed by a 180° rotation of one molecule relative to the other and the assembly is stabilized by extensive intermolecular interactions, primarily involving atoms from both termini and also interactions that form the loop regions connecting the β4–β5 and β8–β9 strands (Fig. 3A). The first eight N-terminal residues in each molecule adopt an extended conformation, with protrusions similar to hooks holding the two monomers together (Fig. 3A), which is a structural characteristic that has not been observed in other homologue lectins such as the ricins [17] or those of the β-trefoil fold [6, 16] (Fig. 3C). The first six N-terminal residues of mol A that form this hook participate in hydrogen-bond interactions with the three C-terminal residues (139–141) of mol B. The residues forming each hook participate in 109 van der Waals interactions with residues from the other monomer, whereas they also form 38 van der Waals interactions between them. Overall, the two monomers are held together by 10 intermolecular hydrogen-bonding interactions between nine residues from mol A and eight residues from mol B and an extensive network of van der Waals interactions. In addition, numerous water molecules mediate interface interactions, mostly hydrogen bonds, that contribute further to the dimer stabilization.

The solvent accessibilities of the RSA monomers are 6813 and 7067 Å2 for mol A and mol B, respectively. Upon dimer association these surfaces reduce to 5656 and 5895 Å2, indicating that areas of 1157 and 1172 Å2 become inaccessible to solvent in mol A and mol B, respectively. This represents approximately 17% of the surface of each molecule that becomes buried. The contribution is almost equal for the nonpolar and polar residues, which contribute 865 and 833 Å2, respectively, of the surface that becomes inaccessible upon dimer formation. The total buried surface area in the complex is 1698 Å2. The shape correlation statistic Sc [18], which is used to quantify the shape complementarity of interfaces and provide an idea of the ‘goodness of fit’ between two surfaces, is 0.6 for the association of RSA mol A and mol B.

N-terminal domain swapping has been observed in many protein dimeric assemblies. First, in RNase A [19] and BS-RNase [20] and, later, in other proteins, including lectins such as cyanovirin-N (a well known anti-HIV protein) [21], C-type lectin-like proteins from snake venoms [22] and the conger eel galectin [23]. This domain swapping has been shown to stabilize their quaternary structure. However, in cyanovirin-N, it has been only observed in the crystal structure and not in the NMR determined structure [24]. We cannot rule out the possibility that the N-domain swapping observed in the RSA structure is a result of the experimental conditions employed for crystallization. However, all six residues of the N-terminal domain in both molecules of the RSA dimer are well defined within the 1.6 Å resolution electron density map. The mean temperature factors of these six residues are 30.1 and 32.1 Å2 for RSA molecules A and B, respectively. These values are not far from those observed for the entire protein molecules A and B: 33.0 and 22.5 Å2. Furthermore, analysis of the protein interface using pisa [25] suggested that this quaternary structure will be stable in solution with a dissociation barrier of 6.3 kcal·mol−1.

The binding of N-acetylgalactosamine in the crystal

The structural similarity of the three RSA subdomains indicated the existence of three carbohydrate recognition sites. The crystal structure of the RSA–GalNAc complex revealed two distinct binding sites, approximately 23 Å away (Fig. 3A). One in subdomain α, which has GalNAc bound in both RSA molecules of the noncrystallographic dimer, and another in subdomain γ, which is occupied by GalNAc only in mol A of the RSA dimer in the crystal. The carbohydrate moiety is well defined within the electron density map. Binding at the carbohydrate recognition site, in subdomain γ of mol B, is impeded by crystal packing and, more specifically, by the packing of the C-terminal residues of a symmetry molecule. This may explain the deterioration of the RSA native crystals that occurred as a result of a prolonged soaking with GalNAc solutions. The putative carbohydrate recognition site in subdomain β, as defined by the superposition of the three subdomains to each other, is not occupied in any of the two molecules of the noncrystallographic dimer. Structural superposition of the free and the liganded RSA structure indicates that the protein does not undergo any significant conformational change upon carbohydrate binding. The r.m.s.d. between the free and the liganded RSA structures is 0.27 Å (main chain atoms) and 0.65 Å (all atoms). However, there are some significant conformational changes in the residues involved in carbohydrate binding (as described below).

The carbohydrate-binding site at subdomain α is a shallow groove created by the loops connecting the β2–β3 and β4–β5 antiparallel strands (Fig. 3A). The binding of GalNAc is almost identical for both molecules in the dimer, involving a network of hydrogen bonds (9 in mol A and 7 in mol B) formed with protein atoms (Table 2) and Van der Waals interactions (57 in mol A and 54 in mol B) (Table 3). The main anchoring point of GalNAc is the galactose ring that binds at the groove by exploiting π-stacking interactions (atoms C5 and O6 of GalNAc) with the atoms of the aromatic ring of Tyr37 and van der Waals interactions (C6 of GalNAc) with atoms of the indole ring of the perpendicularly oriented Trp24 (Fig. 4A). GalNAc forms three hydrogen-bond interactions through its O3 with Asp22 (OD1), His40 (NE2) and Asn44 (ND2); four or three hydrogen bonds between O4 and Asp22 (OD1), Arg25 (N), Gln35 (NE2) and Asn44 (ND2); and two hydrogen-bond interactions between O6 and Glu121 (OE2), Gln35 (NE2) (Fig. 4A and Table 2). One water molecule is involved in hydrogen bonding with N2. In mol B, GalNAc forms additional hydrogen bonds through O1, N2 and O7 atoms with water molecules, indicating a stronger binding. This might explain the lower B factor values observed for the GalNAc molecule bound in mol B compared to the one bound to mol A (Table 4). Upon binding, GalNAc displaces two water molecules and the side chains of Arg25 and Trp24 from this carbohydrate-binding site.

Table 2. Potential hydrogen bonds of GalNAc with RSA
GalNAc atomSubdomain αSubdomain γ
Distance (Å)Distance (Å)
RSA atomMol AMol BRSA atomMol A
O1Water 2.6Water3.1
  Water2.3
O3Asp22 OD22.52.5Asp112 OD12.9
His40 NE22.62.9Asp112 OD22.7
Asn44 ND22.92.9  
O4Asp22 OD12.62.8Asp112 OD13.0
Arg25 N3.02.7Asn134 ND23.3
Gln35 NE22.82.8Water3.0
Asn44 ND23.3   
O6Gln35 NE2  Ser116 OG2.6
Glu121 OE22.52.7  
O7Water 3.0Ser78 OG3.2
N2His40 NE23.1 Water3.2
Water3.02.9  
Table 3. Van der Waals interactions between GalNAc and RSA in the crystal. Van der Waals distances are calculated using a van der Waals contact radii for atoms C 2.05 Å; O 1.54 Å and N 1.7 Å
 RSA residues
Subdomain αSubdomain γ
GalNAc atomsMol AMol BMol A
C1Arg25 CG Glu115 CG, CA, N
C2Asn44 ND2; His40 NE2Asn44 ND2Glu115 N
C3Tyr37 CZ, CE1; Asn44 ND2; Asp22 OD1,OD2; His40 NE2His40 NE2; Tyr37 CZ; Asp22 OD2; Asn44 ND2Glu115 N; Asp112 OD1, OD2; Asn134 ND2
C4Asp22 OD2,OD1; Asn44 ND2; Arg25 NAsp22 OD2, OD1; Asn44 ND2; Arg25 CA, NTyr127 CE2; Asp112 OD1; Asn134 ND2
C5Tyr37 CE2Tyr37 CE2; Arg25 CG, NGlu115 CA; Asn134 ND2
C6Gln35 NE2; Glu121 OE2; Trp24 NE1, CD1, CG; Arg25 NGln35 NE2; Glu121 OE2; Trp24 NE1, CD1, CG; Arg25 CG, NAsn134 ND2, OD1
C7His40 NE2, CE1, CD2His40 NE2, CE1; Asn44 OD1Ile125 CD1; Ser78 OG
C8His40 CE1, ND1, CG, CD2, NE2His40 NE2, CE1; Asn44 OD1Ser78 OG
O3Asn44 OD1,CG; His40 CE1, CD2; Asp22 OD2, CG; Gln45 NE2; Tyr37 CD1Gln45 NE2; His40 CD2,CE1; Asp22 OD1, CG; Asn44 OD1, CGIle125 CD1; Glu115 N; Tyr127 CD2, CG; Asp112 CG
O4Arg25 N; Asp22 OD1,CG; Asn44 ND2; Trp24 CA; Leu23 OLeu23 O; Asp22 OD2,CG; Asn44 ND2; Trp24 CA, N, CAsn134 CG, OD1; Asp112 OD2, CG
O5Arg25 CGArg25 CD, CG, CA, N 
O6Tyr37 CE2; Gln35 OE1, CD; Glu121 CD, CG; Trp24 NE1, CD1Tyr37 CE2; Gln35 OE1, CD; Glu121 CD, CG; Trp24 NE1, CD1Glu115 C; Asn134 ND2, OD1, CG; Ser116 CB, N
O7Asn44 OD1, ND2; His40 CE1, NE2Asn44 ND2, CGIle125 CD1; Tyr127 CE1,CD1, OH, CZ
N2His40 CE1, CD2, NE2His40 NE2Glu115 CG, N
Total10 residues (57 contacts)10 residues (54 contacts)7 residues (40 contacts)
Table 4. Crystallographic statistics
 RSARSA–GalNAc
  1. a

    Values given in parentheses are for the outermost shell.

Space groupC2C2
Unit cell, a, b, c (Å), β (°)121.98, 61.09, 32.79, 93.00130.58, 61.25. 32.84. 93.45
Resolution (Å)20.73–1.6013.86–2.20
Outermost shell (Å)1.66–1.602.37–2.20
Reflections measured401 35756 973
Unique reflections27 43613 153
R symm a 0.090 (0.342)0.101 (0.254)
Completeness (%)a86.4 (37.5)99.5 (99.3)
<II> a4.9 (1.5)11.5 (5.8)
Wilson B factor (Å2)21.0510.02
R cryst a 0.181 (0.272)0.150 (0.169)
R free a 0.232 (0.267)0.232 (0.285)
Number of solvent molecules r.m.s.d. from ideality247262
Bond lengths (Å)0.0060.007
Angles (°)1.11.1
Mean B factor
Protein atoms (Å2) (molecule A/molecule B)33.0/22.511.7/8.1
Solvent molecules (Å2)37.016.9
Carbohydrate atoms (Å2)
Subdomain α (molecule A/molecule B) 15.8/10.0
Subdomain γ (molecule A) 24.8
Figure 4.

Stereoview of the carbohydrate recognition sites of RSA with bound GalNAc molecules in mol A at domain α (A) and at domain γ (B). The side chains of residues comprising the RSA carbohydrate-binding sites are shown as ball-and-stick models. Bound waters are shown as black spheres. Hydrogen-bond interactions are indicated by dashed lines. (C) Stereoview of the structural superposition of subdomain α (white) onto subdomain γ (grey) with bound GalNAc molecules.

The carbohydrate-binding site at subdomain γ of mol A is created by the β10–β11 hairpin and the loop connecting β11–β12 strands (Fig. 3A). The binding of GalNAc at this site involves 40 van der Waals interactions and six hydrogen bonds with the surrounding protein atoms (Tables 3 and 4). Α conserved characteristic in the mode of binding compared to that in the site of the α subdomain is the stacking interaction of the galactose moiety with the aromatic ring of a tyrosine (Tyr127 in this case) (Fig. 4B), whereas there is no relevant interaction of GalNAc with a Trp residue analogous to that with Trp24 in the other binding site. In detail, O6 forms a hydrogen bond with Ser116 OG, whereas O4 forms two hydrogen bonds with protein atoms Asp112 OD1, Asn134 ND2 and a water molecule (Table 2). Furthermore, O6 hydrogen bonds with the two carboxyl oxygens of Asp112, whereas O7 forms one hydrogen-bond interaction with Ser78 OG and O1 is involved in two hydrogen bonds with water molecules (Fig. 4B). Structural superposition of the α and γ subdomains in complex with GalNAc reveals that the two sugar molecules superpose well with two significant differences, with the two planes being inclined to each other by 30° and by a rotation of 180° with respect to an axis perpendicular to the pyranose ring (i.e. the N-acetyl groups are pointing in opposite directions) (Fig. 4C). The number of protein–carbohydrate interactions at the RSA–GalNAc crystal structure (Tables 2 and 3) implies that the binding site at subdomain α has a higher affinity for GalNAc than the binding site of subdomain γ. However, NMR titration studies such as those performed for the R-type lectin EW29 [26, 27] may be required to determine the association constant for carbohydrates at each site.

Subunits α and γ have a GalNAc binding site, whereas β does not, despite the structural similarity of the three subunits. Structural superposition of subunits α and γ onto subunit β revealed that the aromatic residues tyrosine and tryptophane are missing from the equivalent site in subunit β, thus offering a possible explanation for the lack of GalNac binding at this subunit.

On forming the complex with the protein, GalNAc becomes buried in both binding sites. The solvent accessibility surface area of the free GalNAc molecule is 370 Å2. Upon binding, this area shrinks to 118 and 117 Å2 at the two binding sites at subdomains α and γ, respectively, indicating that surface areas of 252 Å2 (68%) and 253 Å2 (68%) become buried in the protein complex at the two sites. The greatest contribution comes from the nonpolar groups, which contribute 73 (62%) and 60 Å2 (51%) of the ligand surface that becomes inaccessible at the two RSA-binding sites. On the protein surface, surface areas of 124 and 154 Å2 become inaccessible on binding of GalNAc to the two binding sites at subdomains α and γ, respectively. The total buried surface area (protein plus ligand) for the RSA–GalNAc complex is 375 and 407 Å2 at the two binding sites at subdomains α and γ, respectively. The shape correlation indices (Sc) [18] between the GalNAc surface and the RSA surface are 0.73 and 0.76, respectively.

RSA is shown to be GalNAc-specific but not Gal and the complex structure provides a structural explanation. The N-acetamide group is not involved in any hydrogen-bond interactions in the binding site of subunit α, although it participates in 15 van der Waals interactions with protein residues at this site (Table 3). These interactions, albeit weak, are a significant number and, collectively, can form the structural basis for the GalNAc specificity of this site. In the binding site of subunit γ, this group is involved in a hydrogen bond with the side chain of Ser78 (Table 2) and forms 10 van der Waals interactions with protein residues (Table 4). Hence, it appears that the hydrogen bond of Ser78 (Fig. 4B), together with van der Waals interactions, forms the structural basis of the GalNAc specificity at this site. The hydrogen bond with Ser78 appears to be RSA-specific because this residue is not conserved in other similar carbohydrate-binding proteins (see below).

Glycan array analysis showed that RSA displays specificity against terminal GalNAc, although it also has considerable affinity to sugars that have a terminal sialic acid residue. Carbohydrate-binding proteins with the β-trefoil fold, such as EW29 mutant [28] or the haemagglutinin subcomponent of the clostridium botulinum type C 16S progenitor toxin [29], bind terminal sialic acid residues. Substitution of the GalNAc molecules by N-acetylneuraminic acid molecules in the RSA–GalNAc complex and energy minimization of the complex structure revealed that the N-acetylneuraminic acid molecules can be easily accommodated at the carbohydrate-binding pockets of RSA without any structural impediments. This is facilitated by the fact that the long C1′ group protrudes towards the solvent, whereas the C5′ substituents engage in hydrogen-bond interactions with the side chain atoms of Glu121 and Gln35 in the binding site of subdomain α. At the RSA binding site of subdomain γ, a similar approach revealed that structural impediments with Asp112 and Asn134 would prevent sialic acid binding, at least in a similar structural mode to GalNAc. This observation offers a possible structural explanation with respect to the weaker affinity of RSA for sugars that have a terminal sialic acid residue as opposed to those with GalNAc.

Comparison with other lectins

A PDBeFold [30] three-dimensional structural similarity search against all proteins in the Protein Data Bank revealed that the RSA structure is mostly similar to Clitocybe nebularis R-type lectin (CNL) [12], the phytopathogenic ascomycete Sclerotinia sclerotiorum lectin (SSA) [6] and the galactose-binding lectin EW29 from the earthworm Lumbricus terrestris [13]. The r.m.s.d. between the RSA structure and that of CNL, SSA and EW29 is 1.17, 1.23 and 1.52 Å, respectively, for 143 equivalent Cα atoms. However, the RSA dimer association is totally different from the other three lectins that do not display the RSA N-terminal loop interchange between the two subunits and the 180° rotation of one monomer with respect to the other. Most of their other differences are located at the loop regions, with the most profound ones being located at the loops connecting strands β3 and β4 (part of the carbohydrate binding site in subunit α), β4 and β5, and β6 and β7. In addition, SSA and EW29 have an extended loop in the place of strand β8 of RSA. The most interesting differences occur in the carbohydrate-binding sites. CNL and SSA have one carbohydrate-binding site instead of two. The carbohydrate recognition sites of CNL, SSA and EW29 superpose on the binding site of subunit α of RSA, as well as the galactose ring, in all four lectin–carbohydrate complexes. Comparative structural analysis of the four lectins shows that residues Asp22 and Asn44 of RSA are conserved. Gln35 is an Ile in EW29 and CNL, and an Arg in SSA. Glu121 of RSA is Gly in SSA and EW29, and Ser in CNL. The two aromatic residues Trp24 and Tyr37 whose side chains form the shallow groove of the RSA carbohydrate recognition site are not conserved in the other three lectins. Thus, Trp24 is Glu in EW29, and Thr in CNL. Although SSA, similar to RSA, has a Trp residue in this position, the orientation of its side chain is totally different from that in RSA and it superimposes with the Gln35 of RSA (Arg in SSA). Tyr37 is structurally conserved because SSA has a Tyr residue in this position, EW29 a Trp and CNL a His residue. Therefore, it appears that there is always a ring available to allow π-stacking interactions with the carbohydrate ligand. Despite the differences in the carbohydrate recognition sites of the four lectins, GalNAc moiety binds with a similar orientation to all four of them within a radius of 2.0 Å, with O6 pointing to the interior of the structure and the N-acetyl group toward the solvent.

Structural superposition of the four proteins at the carbohydrate-binding site of subunit γ revealed that Asp112 is conserved in EW29 and there is a Thr in CNL and SSA. Asn134 is conserved in all four, Ser116 is conserved in EW29, there is a Tyr in SSA, and there is no equivalent residue in CNL. Ser78 is a Gly in CNL and there is no equivalent residue in SSA or in EW29. A hydrogen bond between Ser78 of RSA and the N-acetamide group of GalNAc appears to be the structural basis for the RSA specificity towards GalNAc; thus, this sequence difference between the three proteins might be important. The most interesting difference between the four proteins in the architecture of this site occurs in the position of the Tyr127 in RSA, where there is a tryptophane residue in EW29, a glycine in CNL and an alanine in SSA. However, there is a tyrosine residue in this site in SSA at the opposite side of Tyr127. This difference might trigger a different binding mode of the sugar from the one observed in RSA because the tyrosine ring overlaps with the position of atom O6 upon superposition of the two complex structures. Such a difference is observed in the structural comparison between the RSA–GalNAc and the EW29–GalGalNAc complex structures, where the positions of the two galactose moieties are related by a 180° rotation, with the O6 and N-acetyl groups pointing in opposite directions.

Conclusions

The fine carbohydrate-binding specificity of RSA was determined by glycan array analysis and revealed a high selectivity for terminal GalNAc residues, as well as human blood group A determinants, which is in good agreement with the agglutination of human blood group A erythrocytes. This glycan-binding pattern is similar to that of the ricin-related fungal lectins from CNL [31] and SSA [32], although it is very different from the specificity reported for the lectin from Rhizoctonia bataticola, which preferentially binds complex N-glycans [33]. Similar to CNL and SSA, RSA shows low binding to galactose and lactose on the array, although the lectin is efficiently bound to a galactose-Sepharose column. It can be envisaged that the carbohydrate specificity of RSA is essential for its insecticidal activity, which most probably relies on the recognition of GalNAcylated glycans in insect tissues or cells [5, 34].

RSA shows a clear preference for GalNAc over GlcNAc and it appears to be capable of differentiating structurally between the two carbohydrate ligands. Both identified binding sites in RSA recognize GalNAc but not GlcNAc. Soaking of native crystals with GlcNAc solution or crystals grown in the presence of a 25 mm solution of GlcNAc did not show any binding of GlcNAc in the crystal. The crystal structure of RSA reveals that the GalNAc molecule binds as a result of anchoring the epimeric hydroxyl group (O4) and the hydroxyl groups O3 and O6 by direct hydrogen-bond interactions with the protein residues in two distinct carbohydrate recognition sites. However, further studies are required aiming to establish the physiological role of the two carbohydrate-binding sites in this lectin. The novel dimeric assembly reveals the diversity in the quaternary structures within the R-type lectin family, whereas the interchange of the N-terminal tail that assists in the association of the two molecules represents an alternative to disulfide bonds for the stabilization of the lectin assembly.

Experimental procedures

Purification of the lectin

The R. solani lectin was extracted from a 30-day-old liquid culture [strain 207.84 (AG-1C) from the Centraal bureau voor Schimmel cultures (Baarn, The Netherlands) grown in potato dextrose broth liquid culture. The lectin was purified by several rounds of affinity chromatography on immobilized galactose followed by ion exchange chromatography, as described previously [7].

Molecular cloning of RSA

Genomic DNA was extracted from young mycelia or sclerotes of R. solani AG-3 using the protocol described by Weiland [35]. PCR amplification of the RSA sequence was performed using primers deduced from the genomic sequence obtained from the draft Rhizoctonia genome sequence (http://www.rsolani.org). Alternatively, PCR primers derived for conserved regions in the ricin B lectin sequence from fungi were used. PCR fragments were cloned in pJET vector (ThermoScientific, Erembodegem, Belgium) and sequenced (Agowa, Berlin, Germany).

RNA was extracted from young developing sclerotes of R. solani AG-3 using the Fast RNA Pro Green kit (Q-BioGene, Illkirch, France) in accordance with the manufacturer's instructions. cDNA was synthesized and used for PCR amplification of the RSA sequence using primers deduced from the genomic sequence. PCR fragments were cloned and sequenced as described above. The RSA sequences were used to search for homologous sequences at NCBI GenBank using blastn and blastx [36]. The sequences were aligned by using clustalx [37].

Glycan array screening

Glycan array analysis was performed by The Consortium for Functional Glycomics (http://www.functionalglycomics.org/static/consortium/resources/resourcecoreh.shtml) as described previously by Blixt et al. [38]. The printed array version 3.1 with 377 glycan targets was used for the analyses. RSA was directly labelled with Alexa488 and array slides were incubated with various concentration of the lectin (0.5, 0.25 and 0.10 μg·mL−1). Therefore, RSA was diluted to the desired concentrations in binding buffer (Tris-buffered saline containing 10 mm CaCl2, 10 mm MgCl2, 1% BSA, 0.05% Tween 20, pH 7.5). After application of 70 μL to the printed surface of the array, a cover slip was added and the slide was incubated at room temperature in a humidified chamber away from light for 1 h. Bound lectin was detected by measuring the fluorescence. The complete primary data set for each protein is available on the website of the Consortium for Functional Glycomics (http://www.functionalglycomics.org).

X-ray crystallography

RSA crystals were grown using the sitting drop vapour diffusion method [39]. Crystallization conditions were established after screening a series of commercially available crystal screens with the Oryx Nano crystallization robot (Douglas Instruments, Hungerford, UK) in sitting drop 98-well plates. Initial successful crystallization conditions were optimized further to obtain large crystals and sitting drops were set of equal volumes of RSA (15 mg·mL−1) and reservoir solution [0.1 m sodium acetate, pH 4.2, 20% (v/v) poly(ethylene glycol) 8000, 0.2 m sodium chloride, 10 mm sodium citrate]. After several days at 16 °C, RSA crystals were grown. Prior to data collection, crystals were rapidly soaked in a cryoprotectant solution consisting of 20% (w/v) poly(ethylene glycol) 400, 0.1 m sodium acetate, pH 4.2, 20% (w/v) poly(ethylene glycol) 8000, 0.2 m sodium chloride and 10 mm sodium citrate, and were flash frozen in a nitrogen stream at 100 K. Data were collected at station ID9111-2 at Max-Lab Synchrotron source (Lund, Sweden) (λ = 1.03954 Å) at 1.6 Å resolution. Crystal orientation, integration of reflections, inter-frame scaling, partial reflection summation, and scaling and merging of the two datasets were performed using hkl [40]. The automated molecular replacement software mrbump [41] was used to create a model using the RSA sequence and searching in homologous structures. The structure of the S. sclerotorium agglutinin SSA (Protein Data Bank code: 2x2s) [6] was identified as a model. Docking of the sequence to the model and model building of the missing parts was performed using arp/warp [42]. Alternate cycles of manual rebuilding with coot [43] and refinement using refmac [44] improved the quality of the model. A final refinement round with phenix [45] was performed to finalize the model using TLS refinement with 14 TLS groups. The RSA–GalNAc complex was obtained by soaking a single native crystal in a solution of 12.9 mm GalNAc in reservoir solution [0.1 m sodium acetate, pH 4.2, 20% (v/v) poly(ethylene glycol) 8000, 0.2 m sodium chloride, 10 mm sodium citrate) for 48 h at 16 °C. Soaking at higher GalNAc concentrations caused severe deterioration of the RSA crystals and this might be associated with the 7% increase in the a axis of the unit cell with respect to the unit cell dimensions of the free RSA crystal (Table 4). Prior to data collection, the soaked crystal was rapidly soaked in cryoprotectant solution solution [20% (w/v) poly(ethylene glycol) 400, 0.1 m sodium acetate, pH 4.2, 20% (w/v) poly(ethylene glycol) 8000, 0.2 m sodium chloride, 10 mm sodium citrate] and was flash frozen in nitrogen stream for data collection at 100 K. X-ray diffraction data were collected on a SuperNova source diffractometer (Oxford Diffraction Ltd, Abingdon, UK) with a 135-mm Atlas charge-coupled device area detector using a Nova microfocus Cu-Kα radiation source (λ = 1.54178 Å). Data processing was performed with crysalispro [46] and scaled using scala from ccp4 [47]. Crystallographic refinement of the complexes was performed by maximum-likelihood methods using refmac [44] followed by a final round with phenix [45]. The starting model employed for the refinement of the complex was the structure of the free RSA at 1.6 Å resolution. Data collection and refinement statistics are presented in Table 4. The stereochemistry of the protein residues was validated by molprobity [48] and all protein residues are in the most favoured or allowed regions of the Ramachandran plot. Hydrogen bonds and van der Waals interactions were calculated with contact as implemented in ccp4 [47] applying a distance cut-off of 3.3 and 4.0 Å, respectively. Accessible molecular surfaces were calculated with naccess [49]. All artwork was prepared using pymol [50].

Acknowledgements

We thank Dr David Smith and the Consortium for Functional Glycomics (funded by the NIGMS GM62116) for help and advice with the glycan array analysis. This work was supported in part by the Fund for Scientific Research – Flanders (FWO grants G.0022.08 and KAN 1.5.069.09.N.); the Research Council of Ghent University (BOF2007⁄GOA⁄0017); and the Postgraduate Programmes ‘Biotechnology – Quality Assessment in Nutrition and the Environment’ and ‘Application of Molecular Biology – Molecular Genetics – Molecular Markers’ (Department of Biochemistry and Biotechnology, University of Thessaly). X-ray data collection was supported by grants from European Community – Research Infrastructure Action under the FP6 ‘Structuring the European Research Area’ Programme (through the Integrated Infrastructure Initiative ‘Integrating Activity on Synchrotron and Free Electron Laser Science’) for work at the Synchrotron Radiation Source MAX-Lab (Lund, Sweden) and EMBL Hamburg Outstation (Hamburg, Germany).

Ancillary