Database The atomic coordinates and structure factors have been deposited in the Protein Data Bank under the accession code 3GAX
M. Jaskolski, Faculty of Chemistry, Department of Crystallography, A. Mickiewicz University, Grunwaldzka 6, 60-780 Poznan, Poland Fax +48 61 8291505 Tel: +48 61 8291274 E-mail: firstname.lastname@example.org
Human cystatin C (HCC) is a family 2 cystatin inhibitor of papain-like (C1) and legumain-related (C13) cysteine proteases. In pathophysiological processes, the nature of which is not understood, HCC is codeposited in the amyloid plaques of Alzheimer’s disease or Down’s syndrome. The amyloidogenic properties of HCC are greatly increased in a naturally occurring L68Q variant, resulting in fatal cerebral amyloid angiopathy in early adult life. In all crystal structures of cystatin C studied to date, the protein has been found to form 3D domain-swapped dimers, created through a conformational change of a β-hairpin loop, L1, from the papain-binding epitope. We have created monomer-stabilized human cystatin C, with an engineered disulfide bond (L47C)–(G69C) between the structural elements that become separated upon domain swapping. The mutant has drastically reduced dimerization and fibril formation properties, but its inhibition of papain is unaltered. The structure confirms the success of the protein engineering experiment to abolish 3D domain swapping and, in consequence, amyloid fibril formation. It illustrates for the first time the fold of monomeric cystatin C and allows verification of earlier predictions based on the domain-swapped forms and on the structure of chicken cystatin. Importantly, the structure defines the so-far unknown conformation of loop L1, which is essential for the inhibition of papain-like cysteine proteases.
The cystatin superfamily of cysteine protease inhibitors is subdivided, according to structural features and presence in the intracellular, extracellular or intravascular space, into three families: 1 (stefins), 2 (cystatins), and 3 (kininogens) [1–3]. Human cystatin C (HCC), a member of the family 2 cystatins, is an inhibitor of papain-like (C1) and legumain-like (C13) cysteine proteases . The inhibition of C1 proteases is especially potent, with dissociation constants (in the femtomolar range)  representing the strongest competitive inhibition known in biochemistry. HCC is found in all body fluids, with particularly high concentrations in the cerebrospinal fluid , and its function is to regulate the activity of cysteine proteases, either released from lysosomes of dying or damaged cells , or originating from microbial invasion . In common with other family 2 cystatins [1,4], the 120 residue polypeptide chain of HCC is expected to fold as a monomer with two well-conserved disulfide bridges (Cys73–Cys83 and Cys97–Cys117) in the C-terminal half of the sequence [8,9]. By analogy with enzyme complexes of family 1 cystatins [10–12], and chicken cystatin docked in the active site of papain , the inhibitory epitope of HCC has been postulated to include the N-terminal peptide Ser1–Val10, and two β-hairpin loops, L1, containing a signature motif QxVxG(55–59), and L2, with a characteristic Pro105-Trp106 sequence, all aligned in a wedge-like fashion at one side of the molecule .
The need to define a precise enzyme-binding motif was dictated by the idea of using it as a molecular template in the design of efficient small molecule inhibitors, targeting cysteine proteases involved in tissue-degenerative diseases, such as osteoporosis or paradontosis , or produced by highly virulent strains of bacteria and viruses [15–18]. To this end, we have carried out numerous crystallization experiments on papain–HCC mixtures. However, single crystals of a papain–HCC complex could never be obtained, as the incubated samples invariably undergo proteolytic degradation, probably because of impurities present in commercially available papain samples. Also, despite many years of effort, crystallization of monomeric HCC has not been achieved. Instead, a number of crystallization conditions have invariably led to crystals being formed from HCC dimers, arising as a consequence of 3D domain swapping [19–21], a phenomenon in which the monomeric fold is re-created in an oligomeric context, i.e. from fragments of the polypeptide chain contributed by different molecules . In a closed 3D domain-swapped dimer, the protein fold is essentially as in the monomeric form, except in a hinge element, which has a drastically changed conformation, allowing the protein to partially unfold and form a mutual grip with a similarly unfolded partner. In the HCC dimer, the hinge element is loop L1, which assumes an extended conformation engaged in a long intermolecular β-sheet, and the N-terminal fragment of the molecule (β1–α–β2) is anchored in the domain-swapped partner molecule (Fig. 1B). This unexpected difficulty has opened a completely new aspect of HCC research, connected with a naturally occurring L68Q variant of HCC, whose extremely high propensity for dimerization and aggregation leads to amyloid deposits in cerebral vasculature in a lethal disease known as hereditary cystatin C amyloid angiopathy [23,24]. The discovery of a domain-swapped HCC structure  provided the first experimental evidence of 3D domain swapping in an amyloidogenic protein, and has rekindled interest in this phenomenon as a possible mechanism of amyloid formation [25,26]. Since then, 3D domain swapping has been demonstrated in two other amyloidogenic proteins, namely the prion protein  and β-microglobulin . The interest in HCC, however, has not declined, partly because the dimeric protein has been crystallized in several forms , including a polymorph in which the domain-swapped molecules have aggregated to build an infinite structure with all β-chains in a perpendicular orientation relative to a common direction , as required by the cross-β canon  of amyloid fibril architecture. Although the present view of amyloid aggregation is more complex , the interest in 3D domain swapping remains high.
Until recently, the available experimental data were insufficient to decide between two models relating 3D domain swapping and amyloid structure. In one of them, 3D domain swapping would operate in a propagated, open-ended fashion linking all the molecules into a runaway structure. According to the other view, 3D domain swapping would only be necessary for the formation of dimeric building blocks, which would aggregate to form fibrils using a different mechanism. Two recently reported experiments have demonstrated that, at least in the amyloid fibrils of T7 endonuclease  and of HCC , 3D domain swapping in the propagated mode is taking place.
A related research interest is focused on ways to prevent oligomerization and aggregation of amyloidogenic proteins. In one study, catalytic amounts of antibody or enzymatically inactive papain prevented dimerization of wild-type and L68Q HCC . Also, it was possible to inhibit dimerization, oligomerization and amyloid formation of HCC by site-directed mutagenesis of its sequence. Specifically, under the assumption that in a 3D domain-swapped dimer the organization of the structural elements closely resembles the monomeric fold, the dimeric structures of HCC were analyzed in order to identify places where a covalent crosslink would tether those structural elements of the monomer that undergo separation during domain swapping. Thus, pairs of juxtaposed Cys residues were introduced into the HCC sequence, with the expectation that their connection through a disulfide bond would provide the necessary crosslinkage . Two monomer-stabilizing disulfide bridges were introduced in this manner, between strands β2 and β3 (monomer-stabilized HCC-stab1 double mutant L47C/G69C) or between the α-helix and strand β5 (HCC-stab2 double mutant P29C/M110C).
In this work, we have crystallized the HCC-stab1 mutant and determined its 3D structure at 1.7 Å resolution. The structure confirms the success of the protein engineering experiment to abrogate 3D domain swapping, and demonstrates for the first time HCC folded as a monomeric protein. It allows verification of the earlier predictions based on the domain-swapped form or on the crystal structure of chicken cystatin [13,35,36]. Importantly, the structure of HCC-stab1 defines the so-far unknown conformation of loop L1, an element that is essential for the inhibition of papain-like cysteine proteases.
Results and Discussion
Structure solution and refinement
Initial analysis of the diffraction pattern of poorly diffracting HCC-stab1 crystals (2.6 Å) suggested that they have hexagonal symmetry, with 12 protein molecules in the unit cell. A successful run of molecular replacement with chicken cystatin [Protein Data Bank (PDB) code: 1CEW] as a model confirmed the correctness of the P61 space group and, indeed, revealed two HCC-stab1 molecules in the asymmetric unit. A complete model with correct sequence was obtained after several rounds of manual rebuilding and maximum likelihood refinement. Although the atomic model agreed well with electron density maps, the refinement was characterized by high R-factors. Re-examination of the data revealed hemihedral twinning with a twin law (h, – h – k, – l). The same kind of twinning was detected in a new dataset, with a resolution of 1.7 Å, that was collected for a crystal from a different crystallization trial. The subsequent refinement, carried out in refmac5  with the appropriate twin option, used the new data and a set of Rfree reflections selected in pairs according to the twin law. The final value of the twinning fraction, α = 0.452, indicates almost perfect twinning. The refinement converged with R and Rfree of 0.138 and 0.167, respectively, and with other parameters as listed in Table 1.
Table 1. Data collection and refinement statistics.
a Values in parentheses are for the highest-resolution shell.
Cell dimensions (Å)
a = 76.26, c = 97.57
Reflections work/test set
No. of protein/water atoms
<B-factors> protein/water (Å2)
Rmsd from ideal
Bond lengths (Å)
Bond angles (°)
Ramachandran statistics of ϕ/ψ angles (%)
Description of monomeric HCC structure
The two HCC-stab1 molecules (A and B) in the asymmetric unit display the canonical cystatin fold, (N)–β1–α–β2–L1–β3–AS–β4–L2–β5–(C), with a five-stranded antiparallel β-sheet gripped around a long α-helix (Fig. 1). The AS is a broad irregular ‘appending structure’ positioned at the opposite end of the β-sheet relative to the N-terminus/loop L1/loop L2 edge, which is the papain-binding epitope. β1 is the shortest element of the five-stranded antiparallel β-sheet, comprising only two residues. In both molecules, the first 11 residues are disordered and not visible in electron density maps. Two AS residues, different in each molecule, are also disordered: Pro78-Leu79 in A and Leu80-Asp81 in B. Some of the disulfide bridges are evidently partially broken, an effect that can be attributed to radiation damage.
The fold of monomeric cystatin C contains four β-bulges, which endow the molecular β-sheet with very strong curvature. Three of the β-bulges are classified as antiparallel classic type , and the residues at which the pattern of hydrogen bonds is broken are Asp65 (at β2–β3), Glu67 (β2–β3), and Gln100 (β4–β5). The fourth β-bulge, representing a special antiparallel type, is created by His43 (β2–β3). An identical system of deviations from regular β-sheet geometry exists in the HCC dimers with 3D swapped domains, as well as in chicken cystatin.
β-Bulges, which disrupt the continuity of β-interactions, are considered to be a strategy to avoid protein aggregation through intermolecular β-sheet extension [39,40]. For this reason, β-bulges are desired at edge-forming (terminal) β-strands, but are not very common in the β-sheet interior. However, the situation in HCC is quite the opposite. Three of the β-bulges are located exactly within the inner β-sheet elements, namely at the β2–β3 junction. It is very intriguing to note that it is this very junction that gets separated during the 3D domain-swapping event in HCC. It is therefore very likely that the β-bulges destabilize the HCC β-sheet in this region, thus promoting partial protein unfolding and, eventually, oligomerization and higher-order aggregation.
The disulfide bridges
The electron density map provides clear evidence of the existence of the engineered Cys47-Cys69 disulfide bond (Fig. 2), introduced to covalently link strands β2 and β3 of one monomer. A comparison with the structures of 3D domain-swapped HCC (e.g. PDB code: 1G69) (Fig. 1B) or monomeric chicken cystatin (PDB code: 1CEW) shows that the presence of this extra bridge does not perturb the conformation of the main chain in the region of these point mutations. This is possible because the side chain of Cys47 fills the space of the native Leu47, and Cys69, replacing a Gly, occupies the empty space close to the Cα atom. Considering that the rigid fragments (i.e. the β-sheet and α-helix) of 1G69 and 1CEW agree very well with the backbone trace of the analogous elements of HCC-stab1, we may assume that HCC-stab1 correctly represents the structure of the native HCC monomer.
However, it became clear in the course of the refinement that the new disulfide bond had to be partially broken to fit the 2Fo – Fc electron density map contoured at the 1.0σ level. Disruption of disulfide bonds is a common effect observed in protein crystals exposed to intense X-ray radiation, and is caused by the reducing effect of free electrons generated in the ionization events. Consequently, the Cys47–Cys69 bond has been modeled with partial occupancy (40% in A, and 60% in B), and the remaining occupancy has been allocated to free -SH groups according to indications from an Fo–Fc map contoured at the 3.0σ level. As in the preliminary biochemical experiments (including gel filtration) there was no indication of dimerization, it must be concluded that the disruption of the Cys47–Cys69 bond has occurred during the X-ray exposure of the crystal. Another ionizing radiation-driven disruption of a disulfide bond was found at the Cys73–Cys83 bridge of molecule A, which was modeled with 50% occupancy. In molecule B, this bond is intact. The second native disulfide bond, Cys97–Cys117, is intact in both molecules. The above observations illustrate the fact that ionization-induced disruption of a disulfide bond depends on its accessibility in both intramolecular terms and in the crystal packing context.
The new disulfide bond is right-handed, with C–S–S–C torsion angles (χ3) of 74.2° and 75.2° in molecules A and B, respectively. The remaining two, native, disulfide bonds in molecules A and B have the corresponding torsion angles of −89.0° and −97.9° for Cys73–Cys83 and 114.7° and 114.5° for Cys97–Cys117, thus defining them as left-handed and right-handed, respectively. A survey of these bridges in the HCC dimers shows that the C-terminal Cys97–Cys117 bond has a rigid right-handed form, whereas the Cys73–Cys83 bond, found in the AS region, has a variable configuration.
Conformation of loop L1
The present study reveals, for the first time, the structure of intact loop L1 of HCC, which structurally comprises VAG(57–59). However, when referring to loop L1 as part of the inhibitory epitope, one has to include two additional residues at the C-terminal end of strand β2. In this article, we follow the standard L1 nomenclature , referring to the QIVAG(55–59) pentapeptide. The electron density in the L1 area of molecules A and B is of very high quality, allowing for modeling of the backbone and side chain conformations without ambiguity (Fig. 3). The VAG(57–59) triplet can be classified as an inverse γ-turn [41,42], with a hydrogen bond between residues i and i + 2, and i + 1 backbone dihedral angles of −70.4° and −89.0° (ϕ) and 69.1° and 74.4° (ψ) in molecules A and B, respectively. Of all the L1 amino acids, Val57 shows the highest deformation, with backbone torsion angles (ϕ/ψ) that locate it in the lower left quadrant of the Ramachandran plot, in the additionally allowed (−112.9°/−133.4°, molecule B) or generously allowed (−130.9°/−147.4°, molecule A) regions.
In chicken cystatin (PDB code: 1CEW), which is the closest structural analog available for comparison, most of the L1 residues have torsion angles in the preferred regions, except Ser56, which is located at the apex of the loop in a position occupied in HCC-stab1 by Ala58, and which is an evident outlier (ϕ = −176.7°, ψ = −56.1°). This anomaly is very difficult to explain, as the two inhibitors have similar affinities for the cognate enzymes. It is interesting to note that this conformational dissimilarity is coupled with a quite different chemical character of the residue in the apex position of loop L1 (Ala in HCC, and Ser in chicken cystatin). In the structure of stefin B in complex with papain (PDB code: 1STF), the L1 fragment [QVVAG(53–57)] has no outliers in the Ramachandran plot. The residue in question, namely Val55, which is equivalent to Val57 in the HCC sequence, adopts a β-type conformation with ϕ/ψ values (−118.7°/−148.6°) very close to those found in HCC-stab1.
Structural environment of Leu68
Leu68 is of particular importance because its naturally occurring mutation to Gln, endemic to the Icelandic population, results in the creation of a highly amyloidogenic form of HCC. Leu68 is located at the end of strand β3 of the β-sheet, just before the polypeptide chain enters the poorly structured and conformationally variable AS. The Leu68 side chain protrudes from the concave part of the β-sheet towards the interface with the α-helix. The side chains of Val31 and Tyr34 (from the α-helix), Val66 (β3), Phe99 (β4), and Cys97–Cys117 (β4–β5), and the backbone of the SRA(44–46) segment, form a hydrophobic pocket in which the side chain of Leu68 is nearly ideally nested.
Until now, the structural consequences of the L68Q mutation for the stability of monomeric HCC could only be inferred from the structures of 3D domain-swapped dimers of HCC. The present structure provides the first possibility of visualizing the Leu68 side chain in its true monomeric context. Comparison with the Leu68 environment reported in the dimeric structures shows an almost identical arrangement, confirming that the earlier analyses, using the dimeric structure as a template, were essentially correct . The increased size and incompatible chemical character of a Gln side chain at position 68 would result in repulsive interactions destabilizing the α–β interface and leading to an increased likelihood of a partial unfolding process, in which these two main structural elements (i.e. the α-helix and the β-sheet) would separate, thus promoting oligomerization, aggregation and, possibly, fibril formation.
The edge of the HCC molecule containing loop α/β2 and the AS has been reported to harbor a site that is important for the inhibition of legumain . The AS itself spans 26 residues, from Cys69 to Lys94, and has an irregular conformation. The AS is located at the opposite end of the β-sheet to the N-terminus/L1/L2 epitope (Fig. 1A), so there should be no interference between papain and legumain binding by HCC. Consistent with this conclusion is the observation that the 3D domain-swapped dimer of HCC shows unaltered inhibition of legumain, whereas it is incapable of papain binding, because of the absence of the L1 element of the recognition site . The backbones of molecules A and B follow the same AS trace, complementing each other in the regions with gaps (Fig. 4A). Moreover, the conformation found in the present structure is very similar to the backbone traces reported in all of the dimeric structures of HCC and in monomeric cystatin D  (Fig. 4B). This conformation is, however, different from the models proposed for chicken cystatin or for human cystatin F , in which, respectively, a prominent α-helix or a helical coil is present (Fig. 4C).
Crystal packing and molecular interactions
The HCC-stab1 molecules A and B in the asymmetric unit form crystal packing interactions leading to their association into two types of easily recognizable assemblies. In the most conspicuous contact, molecules A and B associate via their AS loops (Fig. 5), burying about 920 Å2 of the molecular surface of each partner in a network of interactions that includes several van der Waals contacts and four hydrogen bonds (Table 2). The two molecules are related by an almost perfect noncrystallographic two-fold axis along [1 0]. The second mode of association buries a much smaller surface area (about 460 Å2 per monomer), but it is important because it illustrates the propensity of HCC to undergo intermolecular β-interactions at the edge of the molecular β-sheet. In this pairing scheme, molecule A interacts with a crystallographic copy of molecule B′ (– y, x – y, z + 1/3) via a pseudo-two-fold rotation coupled with a 3 Å translation (Fig. 5). These two molecules are linked via four main chain–main chain hydrogen bonds (Table 2) to form an intermolecular parallel β-sheet, utilizing their β5 elements. Such a mode of interaction has already been observed in the tetragonal crystal structure of dimeric HCC (PDB code: 1TIJ) . Two additional hydrogen bonds involving side chains support these A–B′ interactions. The noncrystallographic symmetry axis relating the two molecules is aligned with the  direction. The translational component of this noncrystallographic symmetry is necessary to bring the parallel β-sheet interactions into register (Fig. 5). It should be noted that crystallographic symmetry requires both types of dimer to sit on the same axis, so none of the diagonal directions (related by the 31 axis) is close to a perfect dyad. Combination of the crystallographic 61 axis with the diagonal noncrystallographic dyads leads to additional noncrystallographic two-fold axes along the x and y directions. The molecules related by this (nearly perfect) operation present an additional scheme of intermolecular interactions. Specifically, there are four hydrogen bonds between molecules A and B′′ (y – 1, – x + y, z + 5/6) at an interface formed by loop β1/α and the N-terminus of the α-helix. The contact area is about 220 Å2 per molecule. The crystallographic symmetry brings into contact molecules B and B′′ (with the participation of loop L2), with only two hydrogen bonds but many van der Waals contacts and a buried surface area of about 450 Å2 per monomer. Similar interactions are found between molecules A and A′′.
Table 2. Interfaces formed by pairs of HCC-stab1 molecules through crystal contacts.
a The side-chain amide of N82 has been flipped, as it evidently has the wrong rotation in the PDB entry 3GAX.
AS A:D87 (Oδ1) – (N) V18:B β1/α; 2.96
AS A:N82 (Oδ1) – (Nη2) R24:B α; 2.78
β1/α A:V18 (N) – (Oδ1) D87:B AS; 2.91
AS A:N82 (Nδ1) – (Oδ2) D28:B α; 3.09a
AS A:R93 (Nη2) – (O) D119:B β5; 3.00
β5 A:S115 (N) – (O) S113:B β5; 3.10
β5 A:C117 (N) – (O) S115:B β5; 2.95
β5 A:S115 (O) – (N) S115:B β5; 2.87
β5 A:C117 (O) – (N) C117:B β5; 2.98
β5 A:D119 (Oδ1) – (N) D119:B β5; 2.73
β1/α A:E19 (O) – (N) E21:B α; 3.19
β1/α A:E19 (Oε1) – (N) G22:B α; 3.03
α A:E21(N) – (O) E19:B β1/α; 3.09
α A:G22 (N) – (Oε1) E19:B β1/α; 2.95
α/β2 B:M41 (N) – (O) P105:B L2; 3.11
β2 B:Y42 (Oη) – (Oε2) E21:B α; 2.80
α/β2 A:M41 (N) – (O) P105:A L2; 3.07
β2 A:Y42 (Oη) – (Oε2) E21:A α; 2.43
The A–B ‘dimer’ propagates along the 31 axis, utilizing alternating AS–AS and β5–β5 interactions, so there is no indication of oligomeric assembly involving endless β-sheet formation along the crystal z-axis, as observed in some of the dimeric HCC crystal structures . The exposed β-chains at the opposite edge of the β-sheet (β1 and β2) do not participate in any type of intermolecular β–β interaction, so no combination of symmetry operations, real or pseudo, could lead to such an assembly.
It is of note that the pseudosymmetric dyad along  coincides with the two-fold axis relating the twin domains, which probably promotes the prevalence and character (α about 0.5) of the twinning phenomenon observed for these HCC-stab1 crystals .
Comparison with other cystatin models
The overall fold of HCC-stab1 shows no significant differences when compared to the available models of type 2 cystatins, such as chicken cystatin (PDB code: 1CEW), cystatin D (PDB code: 1RN7), or cystatin F (PDB code: 2CH9). In addition, it is possible to compare the monomeric fold of HCC-stab1 with the complete folding units ‘extracted’ from the 3D domain-swapped dimers of HCC, namely 1G96 (one copy in the asymmetric unit), 1TIJ (two copies), and 1R4C (eight copies). In all cases, the main differences are within the AS, and concern, for example, the presence of α-helical segments in 1CEW and 2CH9 (Fig. 4C). When the AS fragment is excluded from the alignment, the Cα superposition becomes much closer, with, for instance, an rmsd value of 0.70 Å for 1G96.
On the other hand, type 1 cystatin models, such as the domain-swapped stefin B (PDB code: 2OCT), and monomeric stefin A in complex with cathepsin H (PDB code: 1NB5) or stefin B in complex with papain (PDB code: 1STF), show very significant deviations from HCC-stab1, mainly in the course of the α-helix (which, in type 1 cystatins, is significantly bent), in the positioning and curvature of the β-sheet, and in the AS, which is essentially absent in the stefin structure. The results of least-squares superpositions of the HCC-stab1 molecule A and other cystatin and stefin models are summarized in Table 3.
Table 3. Rmsd values (Å) between Cα atoms of structurally aligned cystatin models, defined by their PDB codes. For the 3D domain-swapped dimers of HCC, the superposition is calculated for one half of the dimer, corresponding to a cystatin folding unit and containing residues 1–56 from one monomer (A) and 60–120 from the complementary chain (B). Explanation of PDB codes: 1G96, two-fold symmetric 3D domain-swapped HCC dimer ; 1R4C, 3D domain-swapped dimer of N-truncated HCC ; 1TIJ, 3D domain-swapped HCC dimer ; 1RN7, monomeric human cystatin D ; 1CEW, monomeric chicken cystatin ; 2CH9, monomeric human cystatin F ; 1STF, stefin B from papain complex ; 1NB5, stefin A from cathepsin H complex .
1G96 No ASa
1R4C (A, B)
1TIJ (A, B)
a Residues 69–94, forming the AS loop, have been omitted from the calculations.
Manual docking of monomeric cystatin C in the active site of papain
The structural elements of cystatins and stefins that function as epitopes in the inhibitory interactions with their target enzymes are loops L1 and L2 and the N-terminus . In cystatin C, the inhibitory part of loop L1 has the sequence QIVAG(55–59), whereas the most important residues from loop L2 are Pro105 and Trp106. In an unrelated family of cysteine protease inhibitors, represented by chagasin [46–48], the enzyme-blocking epitope is formed by loops L4, L2 and L6, which correspond to the N-terminus/L1/L2 elements of cystatins, in that order. Loop L6 contains conserved Pro and Trp/Phe residues at positions that are equivalent to Pro105 and Trp106 in the HCC sequence.
The exact modes of cysteine protease–inhibitor interactions have been described, for example, for stefin B  (PDB code: 1STF) and chagasin  (PDB code: 3E1Z) in their complexes with papain. These two structures can be used as templates for tentative modeling of HCC-stab1–papain interactions. As a first attempt, the HCC-stab1 molecule was superposed onto stefin B from the 1STF complex, using a secondary structure matching approach; this showed that the two molecules do not align very well, with an overall Cα rmsd as high as 2.51 Å. Moreover, such an overall superposition leads to significant discrepancies in the epitope regions, e.g. a large deviation in the Cα traces of loop L2. Thus, in the next modeling experiment, it was assumed that optimal fit in loops L1 and L2 should be a priority, and Gln55–Gly59 and Val104–Gln107 of HCC-stab1 were superposed onto their equivalents in stefin B. This comparison, however, indicated that the coordinates of loops L2 do not agree with each other, as illustrated by the relative positions of the Cα atom of Pro105, which differ by 1.77 Å in the two structures (Fig. 6). Also, forcing loops L2 to superpose worsens the alignment of loops L1, which in a simple L1/L1 superposition overlap almost perfectly (rmsd of 0.13 Å). These results clearly show that loop L2 is different in HCC and stefin B. One reason for this variability may be the lack of sequence conservation, i.e. the fact that the VPWQ motif of HCC is replaced by LPHE in stefin B. An additional factor may be the involvement of loop L2 of both molecules in the HCC-stab1 structure in similar packing interactions, which may constrain it in a non-native conformation. The third possibility is, of course, that the conformation of the enzyme-binding epitope, in particular the relative disposition of loops L1 and L2, undergoes an induced-fit adaptation on enzyme complex formation. The first hypothesis could be partly tested using an HCC–chagasin alignment, because chagasin contains a Trp in a position equivalent to the Trp106 site of HCC. Such a comparison would not be possible by direct HCC–chagasin superposition, owing to the completely different folds of the proteins. It could be achieved, however, via an intermediate alignment of the papain components of the chagasin and stefin B complexes. Then, when loop L2 from the above L1/L2 superposition of HCC-stab1 and stefin B is analyzed, it is noted that the positions of Pro105 and Trp106 match those of their chagasin counterparts much better than their equivalents in stefin B (Fig. 6). Despite this improvement, Trp106 of HCC-stab1 still clashes with the Gln142 residue of papain, but this obstacle could be avoided by slight conformational adaptations of loop L2. A minor rearrangement may also be required to bring Pro105 into an optimal position with respect to Leu143 of papain, and to optimize the π-type interactions between Trp106 and the cluster of aromatic residues of papain that cap Asn175, the last residue of the catalytic triad . In the current model of the HCC-stab1–papain complex, the interaction interface buries 564 Å2 and 530 Å2 of solvent-accessible surfaces of the interacting partners.
The interface between papain and loop L1 of HCC-stab1 seems to be dominated by hydrophobic interactions, and thus resembles the situation in the stefin B complex rather than in the chagasin complex. No direct interference with the Cys25-His159-Asn175 catalytic triad of papain, as observed for the chagasin–papain complex , is predicted from the current model.
The hypothetical enzyme-binding mode of the N-terminus of HCC could not be analyzed, owing to the absence of this element in the crystallographic model of HCC-stab1. However, the backbone trace near the visible beginning of the HCC-stab1 molecules follows closely the equivalent fragment of stefin B, suggesting that its conformation in these two proteins could be similar. By analogy with other enzyme complex structures, the N-terminal peptide of HCC-stab1 would be expected to form β-sheet interactions with the enzyme.
The above comparisons lead to the conclusion that the predicted interactions of HCC with its target enzyme, based on the structure of the current HCC-stab1 model, are compatible with various observations reported for complexes with other inhibitors. However, it is not possible to build, by simple target-based docking, a model that would explain all aspects of the inhibitory interactions. Therefore, a reliable description of HCC–papain interactions will require an independent experimental study. Work in this direction is in progress.
Protein expression and purification
A variant of HCC with two Cys mutations introduced at Leu47 and Gly69 was expressed in Escherichia coli MC1061 and purified using a modified procedure of Nilsson et al. . Briefly, expression was induced at 42 °C for 3 h when D600 nm was approximately 5. The protein harvested from the periplasmic space was purified by anion exchange chromatography using Q-Sepharose and a buffer containing 20 mm ethanolamine (pH 9.0) and 1 mm benzamidinium chloride. After concentration by ultrafiltration, the sample was subjected to size exclusion chromatography using an Amersham Biosciences FPLC Superdex HR 75 column, equilibrated with 10 mm sodium phosphate buffer (pH 7.4), 140 mm NaCl, 3 mm KCl, and 1 mm benzamidinium chloride. In the last purification step, the protein solution was dialyzed against 20 mm sodium citrate buffer (pH 6.5), with 1 mm benzamidinium chloride. The dialyzed protein was divided into 0.5 mg aliquots, lyophilized, and stored at −20 °C.
Before crystallization, protein samples were dissolved in 150 mm sodium phosphate buffer (pH 7.5), passed through a 0.22 μm Millipore filter, and subjected to size exclusion chromatography using a GE Healthcare HiLoad 16/60 Superdex 200 prep. grade column. Fractions containing monomeric HCC-stab1 were combined and concentrated on Microcon filters (10 kDa cut-off) to 10 mg·mL−1. The final buffer concentration was < 20 mm.
Crystallization experiments were performed at 19 °C, using the hanging-drop vapor-diffusion method and a grid screen with different concentrations of ammonium phosphate versus pH. Drops were mixed from 1 : 1 (v/v) ratios of the protein and precipitant solutions. The best crystals (0.35 × 0.26 × 0.26 mm) grew above a reservoir containing 1.25 m ammonium phosphate (pH 7.0).
Data collection and processing
A preliminary X-ray diffraction dataset extending to 2.6 Å resolution (data not shown) was collected at beamline I911-3 in MAX-lab (Lund, Sweden). Later, a new dataset extending to 1.7 Å resolution was collected at beamline BL14.1 of the BESSY synchrotron (Berlin, Germany), using a Rayonics MX-225 3 × 3 CCD detector. Crystals were cryoprotected in mother liquor supplemented with 25% (v/v) glycerol, and flash-vitrified at 100 K in a nitrogen gas stream. All diffraction data were processed and scaled in the Laue class 6/m with hkl2000 . A summary of data collection and processing is presented in Table 1.
Structure solution and refinement
The structure was solved by molecular replacement with chicken cystatin (PDB code: 1CEW) as a search probe, using phaser . Manual model building was performed in coot , and crystallographic refinement was performed with refmac5 . Despite successful phasing with molecular replacement in space group P61, the refinement had high R and Rfree values of 0.314 and 0.394, respectively. Re-examination of the space group assignment and of the diffraction data for the possibility of crystal twinning, carried out in the ccp4 programs truncate and detwin , as well as in cns , gave a strong indication of hemihedral twinning, with a twin operation (h, – h – k, – l) corresponding to the higher Laue class symmetry 6/mmm. The same conclusions were reached for the second dataset, which was used for all subsequent calculations. The final refinement was carried in refmac5, with the tls  and twin options included. The refinement converged with a final R-factor of 0.138 (Rfree = 0.167) for all data, and a model characterized by rmsd from ideal bond lengths of 0.019 Å, with 93.7% of all residues in the most favored areas of the Ramachandran plot (no residues in disallowed regions). The refinement statistics are shown in Table 1.
This work was supported, in part, by grants from the Polish State Committee for Scientific Research (Project no. 4 T09A 039 25) and from the Swedish Science Research Council (Project no. 5196), by a subsidy from the Foundation for Polish Science to M. Jaskolski, and by a postdoctoral fellowship (Project no. PBZ/MEiN/01/2006/06) from the Polish Ministry of Science and Higher Education to R. Kolodziejczyk. Some of the calculations were performed in the Poznan Metropolitan Supercomputing and Networking Center.