Crystal structure of the β-chain of human hepatocyte growth factor-like/macrophage stimulating protein


E. Gherardi, MRC Centre, Hills Road, Cambridge CB2 2QH, UK
Fax: +44 1223215308
Tel: +44 1223215308


Hepatocyte growth factor like/macrophage stimulating protein (HGFl/MSP) and hepatocyte growth factor/scatter factor (HGF/SF) define a distinct family of vertebrate-specific growth factors structurally related to the blood proteinase precursor plasminogen and with important roles in development and cancer. Although the two proteins share a similar domain structure and mechanism of activation, there are differences between HGFl/MSP and HGF/SF in terms of the contribution of individual domains to receptor binding. Here we present a crystal structure of the 30 kDa β-chain of human HGFl/MSP, a serine proteinase homology domain containing the high-affinity binding site for the RON receptor. The structure describes at 1.85 Å resolution the region of the domain corresponding to the receptor binding site recently defined in the HGF/SF β-chain, namely the central cleft harboring the three residues corresponding to the catalytic ones of active proteinases (numbers in brackets define the sequence position according to the standard chymotrypsinogen numbering system) [Gln522 (c57), Gln568 (c102) and Tyr661 (c195)] and an adjacent loop flanking the S1 specificity pocket and containing residues Asn682 (c217) and Arg683 (c218) previously shown to be essential for binding of HGFl/MSP to the RON receptor. The study confirms the concept that the serine proteinase homology domains of HGFl/MSP and HGF/SF bind their receptors in an ‘enzyme-substrate’ mode, reflecting the common evolutionary origin of the plasminogen-related growth factors and the proteinases of the clotting and fibrinolytic pathways. However, analysis of the intermolecular interactions in the crystal lattice of β-chain HGFl/MSP fails to show the same contacts seen in the HGF/SF structures and does not support a conserved mode of dimerization of the serine proteinase homology domains of HGFl/MSP and HGF/SF responsible for receptor activation.


hepatocyte growth factor/scatter factor


hepatocyte growth factor like/macrophage stimulating protein


receptor tyrosine kinase(s)

Hepatocyte growth factor–like/macrophage stimulating protein (HGFl/MSP) and hepatocyte growth factor/scatter factor (HGF/SF) control the growth, movement and morphogenesis of a variety of cell types in vertebrate organisms (reviewed in [1,2]). Unlike most growth factors, which are small proteins with a relatively simple domain structure, HGFl/MSP and HGF/SF are high molecular weight glycoproteins with a complex multidomain architecture related to that of the proenzyme plasminogen [3]. Both HGFl/MSP and HGF/SF consist of six domains: (a) an N-terminal domain homologous to plasminogen preactivation peptide; (b) four copies of the kringle domain; and (c) a C-terminal serine proteinase homology domain that lacks enzymatic activity due to mutations of critical residues at the catalytic site and S1 specificity pocket [3]. Both proteins are synthesized as single-chain precursors and subsequently processed through cleavage of a long linker peptide connecting the fourth kringle domain and the serine proteinase homology domain. This yields two-chain (α/β) proteins held together by a single disulfide bond: the larger α-chain contains the N-terminal and the four kringle domains, the smaller β-chain corresponds to the serine proteinase homology domain.

The receptors for HGFl/MSP and HGF/SF are the tyrosine kinases (RTK) RON and MET, respectively [4–6]. RON and MET have a complex multidomain architecture [7] and, upon activation, they transduce cell signals crucial for embryo development [8–11], wound healing [12,13] and cancer growth and spreading (reviewed in [2,14]).

In contrast to their well studied biological activities, the mechanism of receptor binding and activation by plasminogen-related growth factors is less well understood and is currently under intense investigation. There is strong evidence that the presence of both the α- and β-chains is required for receptor activation by both HGF/SF and HGFl/MSP [15–17]. However, in HGF/SF the primary (high affinity) receptor binding site is located in the α-chain with the β-chain contributing a lower affinity site [15,16] whereas in the case of HGFl/MSP the high affinity site is located in the β-chain [17–19].

Insights into the mechanism of binding have recently been provided by a crystal structure at 3.3 Å resolution of the complex of the β-chain of HGF/SF and the MET receptor [20]. This structure defines in detail the regions of the β-chain of HGF/SF and the β-propeller domain of MET responsible for the formation of a 1 : 1 complex and the conclusions derived from the crystallographic analysis have also been corroborated by extensive mutagenesis studies [21].

Here we report a crystal structure of a fragment of HGFl/MSP consisting of the serine proteinase homology domain and the last 19 residues of the α-chain. Further, in order to understand better the structural basis of proteolytic activation in receptor binding, we also present a model of the serine proteinase homology domain of HGFl/MSP in its precursor, inactive form. Analysis of the experimental structure and the model confirms that HGFl/MSP, like HGF/SF, has retained an enzyme-like mode of receptor binding involving the area corresponding to the active site of bona fide proteinases. It also shows that the process of proteolytic activation leads to a major rearrangement of the ‘active site’ region that may influence receptor binding.

Results and Discussion

Description of the structure

The structure of the β-chain of HGFl/MSP contains 4 residues of the α-chain (residues Cys468 to Arg471), 225 residues of the β-chain (residues Val484 to Met708) (c16-c242) and 154 water molecules (Table 1 and Fig. 1). The 15 remaining residues of the α-chain and the hexahistidine sequence fused at the C-terminus of the protein are not visible in the structure, presumably due to disorder.

Table 1.  Crystallographic statistics for the structure of the β-chain of HGFl/MSP.
  1. a  Rsym = Σh|I h - < I > |/Σ hI h, where Ih is the intensity of reflection h, and < I > is the mean intensity of all symmetry-related reflections. b Rcryst = Σ||Fobs|-|Fcalc||/Σ|Fobs|, Fobs and Fcalc are observed and calculated structure factor amplitudes. c Rfree as for Rcryst using a random subset of the data (around 5%) excluded from the refinement. d Estimated coordinate error based on the R-value as calculated by refmac[34]. e Calculated with procheck[35].

X-ray diffraction data
 X-ray sourceESRF, beam line ID 13
 Space groupP212121
 Unit cell (Å), (a, b, c)75.68, 75.85, 47.97
 Resolution range (Å)50.0–1.85
 (highest resolution shell)(1.89–1.85)
 Rsyma (%)6.0
 (highest resolution shell)(41.6)
 Completeness (%)98.4
 (highest resolution shell)(98.3)
 No. of unique reflections23920
 Average intensity, < I/σ(I) >14.1
 % reflections with I/σ(I) > 3 in the highest resolution shell54.6
 Wilson B-factor (Å2)21.7
 Rcrystb (%)19.8
 (highest resolution shell)(23.6)
 Rfreec (%)23.7
 (highest resolution shell)(28.2)
 No. of reflections:
 Working22 661
 Molecules per asymmetric unit1
 No. of nonhydrogen atoms:
Model quality
 Estimated coordinate errord (Å)0.13
 R.m.s. deviation bonds (Å)0.017
 R.m.s. deviation angles (°)1.59
 Overall mean B-factor (Å2)25.1
Ramachandran plot analysise:
 %age residues in -
 Most favoured regions88.6
 Additionally allowed regions11.4
 Disallowed regions0
Figure 1.

(A) Ribbon diagram of the β-chain of HGFl/MSP. The three residues replacing the catalytic triad of active proteinases: Q522 (c57), Q568 (c102) and Y661 (c195) are shown as solid sticks. Also shown are the five intradomain disulfide bonds while a sixth disulfide connecting the α- and β-chains [Cys468-Cys588 (c122)] is not shown in the figure. L4, L5, L8, 10, L11 and L13 define loops discussed in the text and detailed in Fig. 2A. (B) Detailed view of the active site region of the β-chain of HGFl/MSP. The three residues corresponding to the catalytic ones of actives serine proteinases are shown in red; residues corresponding to the S1 specificity pocket (L11) are shown in blue; a segment of the L13 loop is shown in yellow. The disulfide bond between C657 and C685 (c191-c220) is also shown. The figure was generated with pymol[36] and spock (

The overall fold closely resembles that of serine proteinases of the chimotrypsinogen family consisting of two antiparallel six stranded β barrels forming two lobes at the junction of which lies the region corresponding to the active site cleft of the enzymes (Fig. 1A). The disulfide bond connecting Cys468 of the α-chain and Cys588 (c122) of β-chain is clearly defined as are the five remaining intradomain disulfide bonds of the β subunit, two in the N-terminal lobe: Cys507-Cys523 (c42-c58) and Cys527-Cys562 (c62-c96) and three in the C-terminal one: Cys602-Cys667 (c135-c201), Cys632-Cys646 (c168-c182) and Cys657-Cys685 (c191-c220) (Fig. 1A). A Cys672 (c206) to Ser mutation was introduced in order to prevent the unpaired Cys672 from forming an aberrant disulfide bond with Cys588 (c122) which would disrupt formation of the correct disulfide bond between the α- and β-chains. The structure of the region corresponding to the active site of bona fide proteinases is conserved with the three residues replacing the catalytic Asp, His and Ser [Gln522 (c57), Gln568 (c102) and Tyr661 (c195)] aligned alongside the cleft (Fig. 1A). In contrast, significant differences are apparent in the structure of the region corresponding to the S1 specificity pocket. While the upper side of the pocket is substantially preserved, with the exception of the Ser to Tyr661 (c195) mutation, in the lower part Pro681 replaces a Trp residue found in most catalytically active serine proteinases and generates a turn in loop L13 (680–691) (c214-c225) which allows greater accessibility to Tyr661 (c195) (Fig. 1A). However, Tyr661 (c195) sterically reduces access to the pocket, suggesting that receptor binding of the β-chain of HGFl/MSP may only involve the entrance of the S1 specificity pocket (Fig. 1B, and below).

Comparison with the HGF/SF and plasmin β-chains and structural consequences of proteolytic activation

The β-chain of HGFl/MSP displays a high level of sequence identity (41% and 39%, respectively) and structural similarity with HGF/SF and plasminogen. Superposition of the structure of the β-chain of HGFl/MSP with those of HGF/SF [21] and plasmin [22] yields rmsd of 2.13 and 2.48 Å over 224 and 209 amino acids, respectively. A structure-based, sequence alignment for the three proteins is shown in Fig. 2A and outlines the strand and loop nomenclature used in this report. Figure 2B shows a ribbon representation of the superposition (HGFl/MSP in blue, HGF/SF in red, plasmin in grey) and demonstrates that the structures of the three proteins are closely conserved in the central region while deviating considerably in certain surface loops, for example L5, L8 and L11 but not others, such as L13.

Figure 2.

(A) Structure-based sequence alignment of the serine proteinase homology domains of HGFl/MSP (this work), HGF/SF (1SI5, [21]), and plasmin (1BUI, [22]). Grey and magenta boxes correspond to conserved β-strands and α-helices, respectively. The alignment was constructed using the output of Joy [37]. (B) Ribbon representation of the superposition of the main chains of the HGFl/MSP (blue), HGF/SF (red) and plasmin (gray) β-chains. The figure was generated using comparer[38] and spock (

The availability of high-resolution crystal structures of both plasmin [22,23] and plasminogen [24,25], i.e. a homologous protein in its precursor and active forms, allowed modeling of the single-chain form of the serine proteinase homology domain of HGFl/MSP and analysis of the structural changes that may result from proteolytic activation. A superposition of the model of the single-chain form and the crystal structure of the two-chain form is shown in Fig. 3A. The comparison illustrates that, as observed with plasminogen, the newly formed N-terminus (Val484) (c16) folds into a hydrophobic pocket forming a ionic interaction with Asp660, causing a movement in loops L11 and L13, which is disulfide bonded to L11 (Fig. 3A). Loop L13 contains a cluster of three positively charged arginine residues (Arg683, Arg687, Arg689) (c230, c234 and c236) one of which (Arg683) (c230) is known to play a major role in RON binding [19] while a further triple arginine cluster is found in loop L10 (Arg637, Arg639, Arg641) (c184, c186 and c188). Together these two loops generate an extended positively charged patch on the surface of β-chain MSP that is repositioned as a result of proteolytic activation and is notably absent in the homologues plasmin and HGF/SF (data not shown). Proteolytic activation of HGFl/MSP also appears to affect the position of loops L4 and L5 but to a lesser extent than L8, L11 and L13 (Fig. 3A).

Figure 3.

(A) Structural alignment of a model of the single-chain, precursor form of the serine proteinase homology domain of HGFl/MSP (green) and the crystal structure of the corresponding two-chain form (pink). The model of the single chain form of the protein was constructed with modeller[39], using the atomic coordinates of the plasminogen β-chain (pdb: 1QRZ) [24] as template. Residues discussed in the text are labeled and shown as solid sticks. The figure was generated with PYMOL [36]. (B) and (C) Surface representation of the serine proteinase domains of two-chain HGFl/MSP (b) and the model of single-chain HGFl/MSP (c). The amino acids shown in colour are either those corresponding to the catalytic triad (on the left) or amino acids in L13 that have been shown by mutagenesis to be involved in receptor binding. The figure has been prepared with spock (

The binding site for the RON receptor

The recent crystal structure of a complex between the β-chain of HGF/SF and a fragment of the MET receptor (PDB accession: 1SHY) [20] showed that the binding of the β-chain of HGF/SF to MET involves an extended area centered around the ‘active site’ of the homologous enzymes and involves the residues corresponding to the catalytic ones [Gln534 (c57), Asp578 (c102) and Tyr673 (c195)] on both sides of the central cleft as well as several residues in L13: Val692 (c215), Pro693 (c216), Gly694 (c217), Arg695 (c218), and Gly696 (c219) that form a continuous binding surface [20,21]. The position of the three residues of HGFl/MSP that correspond to Gln534, Asp578 and Tyr673 in HGF/SF, namely: Gln522 (c57), Gln568 (c102) and Tyr661 (c195) is shown in Fig. 3B. Figure 3B also shows the position of two residues in loop L13: Asn682 (c217) and Arg683 (c218), that were shown previously to be important or essential for binding of HGFl/MSP to the RON receptor [19]. Therefore, although the mutagenesis data are limited compared to HGF/SF and the individual contributions of Gln522 (c57), Gln568 (c102), Tyr661 (c195), I680 (c215), P681 (c216) and V684 (c219) remain to be confirmed, the structural data presented here indicate that the receptor binding site of the β-chain of HGFl/MSP and HGF/SF are highly conserved and that the binding specificity of the two growth factors for their cognate receptors depends on local sequence variation and not on the utilization of different areas of the domain surface.

Mapping the location of amino acids Gln522 (c57), Gln568 (c102), Tyr661 (c195), Asn682 (c217) and Arg683 (c218), of HGFl/MSP onto the surface of the model of the single chain form of the domain (Fig. 3C) illustrates dramatically the putative effect of proteolytic activation of the domain and may provide a basis for the different binding affinities reported for the single chain and two chain forms of the proteinase homology domains of HGFl/MSP and HGF/SF (see for example [21]).

Implication for biological activity

The high-resolution, crystal structure of the two-chain form of the serine proteinase homology domain of HGFl/MSP reported here and the model of the corresponding single-chain (precursor) form discussed above have highlighted a role for the opening of the S1 pocket and a rearrangement of loops L8, L11, L13 in domain activation (Fig. 3A). Given the conservation of HGF/SF and HGFl/MSP, these results imply that domain activation may involve similar changes in HGF/SF.

However, it is well known that the binding affinity of the β-chain of HGFl/MSP for the RON receptor (Kd ≈ 10−9 m) [19] is approximately hundred fold higher than the affinity of the β-chain of HGF/SF for MET (Kd = ≈ 10−7 m) [20]. Does this imply that the β-chain domain of HGFl/MSP uses a larger binding interface than the one defined for the HGF/SF domain? A conclusive answer to this question awaits cocrystal structures of ligand-receptor complexes and more extensive mutagenesis data but the available evidence argues against a more extensive interface. The binding affinity of the β-chain of HGF/SF for MET is weak (Kd = ≈ 10−7 m) and yet the binding surface is very extensive [20,21] and could readily allow the extra hydrogen bonding required to bring the affinity into the nanomolar range in the case of HGFl/MSP and RON.

There are indications from the cocrystal structure of the HGF/SF β-chain – MET complex that the β-chain of HGF/SF may mediate domain dimerization [20] and, as a result, dimerization of two 1 : 1 HGF/SF-MET complexes to form an active signaling unit. This hypothesis is based on the presence of conserved intermolecular interactions involving the N-terminal sequence and residues from loops 8 (c140) and 11 (c180) in the crystal structures of the β-chain of HGF/SF alone [21] or in complex with the β-propeller domain of MET [20]. Given the structural conservation of the β-chains of HGF/SF and HGFl/MSP and their receptor-binding surfaces ([20,21] and Fig. 3B), we analyzed the intermolecular interactions due to crystal packing in the structure of the HGFl/MSP β-chain (Fig. 4). Each molecule buries 2192 Å2 of surface area in contacts with six adjacent molecules through loops L4, L5, L11, L13 and helix 1 but the diverse contacts seen in the crystal structure of the β-chain of HGFl/MSP (Fig. 4) do not include the one seen in the HGF/SF structures [20,21]. These results therefore do not support a functional role for the contact seen in the HGF/SF structures, or of any other set of contacts for that matter, although final elucidation of this important point will clearly require cocrystal structures of full length HGFl/MSP or HGF/SF in complex with the RON or MET receptor and/or mutagenesesis experiments.

Figure 4.

Intermolecular contacts in the crystal structure of the β-chain of HGFl/MSP. The reference molecule is shown in yellow and the six molecules making contacts with it are numbered. The N-terminal residues (V484) of the reference molecule and molecule 6 are indicated and the ‘active site’ residues of both molecules [Q522 (c57), Q568 (c102) and Y661 (c195)] are shown as red spheres. The secondary structure elements involved in the intermolecular contacts are also indicated together with the surface area of the reference molecule buried in contacts with adjacent ones. The latter was calculated using the protein–protein interactions server ( and the figure was generated with pymol.

All the structural and biochemical data currently available provide strong evidence for the concept that the serine proteinase homology domains of HGF/SF and HGFl/MSP have retained the proteolytic mechanism of activation and the enzyme-substrate mode of binding seen in the catalytically active serine proteinases. Although it is well known that the large α-chain of the complex proteinases of the clotting and fibrinolytic cascades play a role in substrate binding, it is equally clear that the serine proteinase domains themselves can bind substrate and cofactors with regions outside the S1 specificity pocket (reviewed in [26]). This versatility of the serine proteinase domain in mediating further protein–protein interactions may also be at work with the plasminogen-related growth factors and the structure of the β-chain of HGFl/MSP reported here should facilitate not only further analysis of the RON binding site but of other areas of the domain possibly involved in receptor oligomerization or interaction with coreceptor molecules.

Experimental procedures

Protein expression

A Cys672Ser mutant of HGFl/MSP used for these studies was obtained by site-directed mutagenesis using the Pfu Turbo polymerase (Stratagene, La Jolla, CA, USA) and the following mutagenic oligonucleotides: 5′-TGCTTTACCCACAACTCATGGGTCCTGGAAGGA-3′ and 5′-TCCTTCCAGGACCCATGAGTTGTGGGTAAAGCA-3′. Plasmid pBSKSII-containing the mutated cDNA was used as a template for PCR amplification of the β-chain MSP fragment by using primers 5′-CGGGATCCCAGTTTGAGAAGTGTGGCAAGAGGG-3′ and 5′-AGCTCTCTAGAATCTACTAGTGGTGATGATGGTGATGACCCAGTCTCATGACCT-3′. The primers introduce a BamHI restriction site at the AUG and a His6 tag and an XbaI restriction site 3′ of the new stop codon. The PCR product was digested with BamHI and XbaI, N-terminally fused to the leader sequence of a human immunoglobulin variable chain (VL-HuLys11) [27] and subcloned into plasmid pA71d at a unique SmaI site. NS0 cells were grown in Dulbecco's modified Eagle's medium supplemented with 10% (v/v) fetal bovine serum. For stable expression of β-chain MSP, 1.5 × 107 cells were transfected with 10 µg of linearized β-chain MSP cDNA by electroporation and placed in complete medium with 0.8% (v/v) hygromycin B (Stratagene, La Jolla, CA, USA). After selection and screening for expression by slot blot, single wells were cloned, and lines with the highest expression levels were selected. Production of the large quantities of protein required for crystallization experiments was carried out in roller bottles in 20 L batches in Dulbecco's modified Eagle's medium supplemented with 1.25% (v/v) fetal bovine serum.

Purification and crystallization

Cell cultures were harvested by centrifugation and the supernatant dialyzed against Phosphate Buffered Saline prior to loading onto an IMAC Ni-NTA Superflow column (Qiagen, Hilden, Germany) for partial purification. Subsequent cationic exchange chromatography on a MonoS column (Amersham Biosciences UK Ltd., Chalfont St Giles, Bucks, UK) with a NaCl gradient yielded over 95% pure HGFl/MSP β-chain. Deglycosylation of the purified protein was performed by overnight incubation at 30 °C with the glycoamidase PNGaseF in a 1 : 25 (w/w) enzyme to substrate ratio. After final gel filtration purification of the reaction mixture, the HGFl/MSP β-chain was concentrated to 10 mg·mL−1 in a buffer containing 20 mm Mes pH 6.0, 100 mm NaCl and used for crystallization using a sitting drop vapour diffusion method. The crystallization drops contained 1 µL of protein mixed with 1 µL of precipitant solution and were equilibrated against 0.75 mL of the latter in 24-well plates (Molecular Dimensions Ltd, Soham, Cambridge, UK) at 19 °C. The initial crystallization condition corresponding to condition #27, from Wizard Screen I™ (Emerald Biosystems, Bainbridge Island, WA, USA) was then optimized by varying the concentrations of crystallizing agents. The best crystals had the appearance of thick rods with a square cross-section and grew in 1.4 m NaH2PO4/0.93 m K2HPO4, CAPS pH 10.5, Li2SO4 0.1 m, final pH 6.1 and reached a maximum size of 350 × 40 × 40 µm in 24 h.

X-ray data collection

Fully grown crystals of HGFl/MSP β-chain were soaked for 5–10 s in a cryoprotectant solution containing 20% glycerol in the precipitant solution listed above. After soaking, the crystals were mounted into rayon cryo-loops (Molecular Dimensions Ltd), flash-cooled in liquid nitrogen and stored in liquid nitrogen until used for X-ray diffraction experiments at the ESRF synchrotron in Grenoble, France (beam station ID 13). The diffraction data were recorded using Quantum4 CCD (Area Detector Systems Corp., Poway, CA, USA) detector and were indexed, integrated, scaled and reduced using HKL diffraction data processing suite [28]. All subsequent calculations were carried using the CCP4 crystallographic suite [29]. The crystals of the HGFl/MSP β-chain diffracted to a maximum resolution of 1.85 Å and crystallographic data collection statistics are given in Table 1.

Structure solution and refinement

Calculation of the Matthew's coefficient [29] suggested the presence of only one molecule of the HGFl/MSP β-chain in the asymmetric unit resulting in a solvent content of about 50%. The structure was solved by molecular replacement using the crystal structure of microplasmin (PDB accession code: 1BUI, 39% sequence identity) [22] as a search probe. Molecular replacement calculations were performed with amore[30]. The rotation function produced a clear peak with signal-to-noise ratio of 0.7 σ (the resolution range 8–3 Å was used for all calculation). This in turn produced a clear peak in the translation function with a correlation coefficient and Rcryst between observed and calculated structure factor amplitudes of 25.4% and 54.9%, respectively. The rigid body refinement performed in amore improved both the correlation coefficient and the R-factor to 37% and 52.7%, respectively.

The initial model obtained was then subjected to several rounds of crystallographic refinement using the cns refinement package [31] and manual rebuilding. Simulated annealing protocols as implemented in cns were utilized in the first rounds of refinement, which was replaced with Powell minimization protocol in the last rounds. The temperature factor refinement included the restrained individual B-factor refinement. Manual rebuilding was performed in xtalview suite [32] using sigmaA weighted 2Fo-Fc, Fo-Fc and annealed omit maps. An automated model-rebuilding program arp/warp[33] was also employed in the early stages of refinement to aid the manual rebuilding procedure. Most water molecules were picked using the xtalview internal subroutine and additional ones were placed manually using the following criteria: a peak of at least 2.5 σ for a Fo-Fc map, a peak of at least 1 σ for a 2Fo-Fc map, and reasonable intermolecular interactions. Final refinement statistics are shown in Table 1.


Coordinates and structure factors have been deposited in the Protein Data Bank, accession code 2ASU.


Work in EG's laboratory is supported by MRC Programme Grant G9704528. TLB thanks the Wellcome Trust Programme Grant (046073) and the BBSRC Structural Biology Initiative for support. Lauris Kemp and David Pratt are gratefully acknowledged for critical reading of the manuscript.