James A. Huntington, Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Wellcome Trust/MRC Building, Hills Road, Cambridge CB2 0XY, UK. Tel.: +44 1223 763230; fax: +44 1223 336827. E-mail: email@example.com
Summary. Blood coagulation is the result of a cascade of zymogen activation events; however, its initiation is allosteric. Factor VIIa circulates in a zymogen-like state and is allosterically activated by binding to tissue factor. Thrombin, the final protease generated in the blood coagulation cascade, has also been shown to exist in a low activity state in the absence of cofactors, and the structural features of this ‘slow’ form have been studied for many years. In this manuscript, I will review the general features that render zymogens inactive and how proteolytic cleavage results in activation, but I will also show how this distinction is blurred by zymogens that have activity (protease-like zymogens) and proteases with low activity (zymogen-like proteases). This will then be applied in the analysis of slow thrombin to reveal how allosteric activation of thrombin simply reflects the conversion from a zymogen-like enzyme to an active serine protease.
The mechanism by which serine proteases hydrolyze peptide bonds had been worked out in great detail by the mid-seventies . Any discussion on what makes a zymogen inactive or what renders a protease zymogen-like requires a brief review of the steps and residues involved in proteolysis. All of the proteases involved in the blood coagulation cascade belong to the chymotrypsin family (S1 in the Merops Database, http://merops.sanger.ac.uk/index.htm) and share a two β-barrel domain structure, with the active site situated between the two domains. The classic representation of the conserved serine protease structure is shown in Fig. 1A, with the active site facing, and substrates binding from left to right from N-to-C termini. For simplicity, a template numbering scheme is typically used for the catalytic domain of all chymotrypsin family members. The catalytic architecture is composed of a primary specificity pocket (S1 pocket, where the substrate P1 residue binds ), the catalytic triad of His57, Asp102 and Ser195, and the oxyanion hole formed by the main chain amides of Gly193 and Ser195 (Fig. 1B). Catalysis proceeds via several well-defined steps: (i) docking of the P1 residue into the S1 pocket to align the scissile P1–P1′ peptide bond with the catalytic residues; (ii) the Oγ of Ser195 initiates a nucleophilic attack on the carbonyl carbon of the P1 residue, potentiated by the transfer of the hydroxyl proton to His57 (delocalized by Asp102); (iii) a tetrahedral transition state is formed and the negative charge on the P1 oxygen atom is stabilized by hydrogen bonding to the main chain amides of Gly193 and Ser195 (the oxyanion hole); (iv) transfer of the proton from His57 to the N-terminus of the P1′ residue occurs with the collapse of the tetrahedral transition state to the trigonal acyl-enzyme intermediate, resulting in the breaking of the P1–P1′ peptide linkage; (v) the P′ region diffuses out of the active site and a single molecule of water diffuses into hydrogen bond with His57; (vi) a proton is withdrawn from the water as the oxygen atom attacks the carbonyl carbon of the acyl intermediate to form the second tetrahedral transition state, again stabilized by the oxyanion hole; (vii) the tetrahedral transition state collapses with the transfer of the proton from His57 to Ser195; and (vii) the peptide containing the P1 residue diffuses out of the S1 pocket to complete the cycle. Thus, for a serine protease to be active, it must have a preformed substrate binding pocket (S1 site of principal importance), and the precise positioning of residues that make up the catalytic triad and the oxyanion hole.
Why zymogens are inactive
There are over twenty structures of zymogens of serine proteases deposited in the Protein databank. The first two, chymotrypsinogen  and trypsinogen [4,5], were published in the 1970s and showed interesting differences of potential functional importance. All residues of chymotrypsinogen were well defined in electron density, whereas in trypsinogen, loops including residues 142–152, 184A–193, and 216–223 could not be modeled due to the presumed flexibility of these regions. Although these loops were defined in the chymotrypsinogen structure, they were in conformations, quite different from that of the active enzyme (Fig. 2A). The catalytic triad was correctly formed in both structures (Fig. 2B), indicating that the proton transfer system is functional in zymogens. This is consistent with the ability of some zymogens to react with small substrates and inhibitors [6–8]. Although different in detail, both structures revealed an obscured S1 pocket and an incorrectly formed oxyanion hole (Fig. 2B). These two features are sufficient to account for the catalytic deficiency of zymogens toward peptide and natural substrates and inhibitors. Subsequent crystal structures of zymogen forms of chymotrypsin family serine proteases have helped to refine the general features that account for inactivity, but also reveal many interesting variations. No two zymogen structures are identical, even when there are multiple copies of the same zymogen sharing the asymmetric unit, indicating that some of the defined structural features may be crystallization artefacts and suggesting an underlying flexibility of certain loops. It is clear that there is more than one way to confer inactivity upon a zymogen, but enough structural information exists to arrive at three general rules: (i) the S1 pocket is inaccessible or not formed; (ii) Gly193 is either flexible or in a conformation that cannot stabilize the tetrahedral oxyanion; and (iii) three loops (excluding the activation loop itself) are either disordered or in a conformation significantly different from that of the active enzyme, 142–153, 184A–194, 215–225. These loops have been called the ‘activation domain’ , but here I will refer to them as ‘activation loops’ or ‘zymogen loops’.
There are two deposited structures of thrombin in a zymogen state, prethrombin-2 (pre2): human bound to the exosite I ligand hirugen ; and, bovine crystallized in two different conditions, but in the absence of any ligand . These structures conform to what was already observed for trypsinogen and chymotrypsinogen, with the zymogen activation loops either fully disordered or ordered with high temperature factors in a conformation distinct from the active protease. As one might expect, it is the hirugen-bound human pre2 that possesses the ordered zymogen loops, and the apo bovine pre2 with disordered loops. In bovine pre2, the disordered regions are 141–146, 186A–188 and 217–224 (set at an occupancy value of 0.01 in the coordinate file), but resides on either side of these loops and are clearly in different conformations to that of active thrombin. As in most other zymogen structures, the catalytic triad is correctly formed, but the S1 pocket and the oxyanion hole are not.
Zymogens are activated by cleavage of the bond between residues 15 and 16; for serine proteases involved in hemostasis, this bond is Arg–Ile (or Val). The new primary amine at the N-terminus of Ile16 becomes positively charged upon cleavage, and through a mechanism referred to as molecular sexuality , it inserts into a hydrophobic pocket formed by the zymogen activation loops (sometimes called the activation pocket), and is stabilized by a salt-bridge with Asp194. The contacts between the hydrophobic N-terminus and the zymogen activation loops are detailed in Fig. 1C and are highly conserved among the chymotrypsin family members. The effect of the insertion of the new N-terminus on the overall structure or flexibility of the protein is limited to the zymogen activation regions described above (see Fig. 2A). The activating conformational changes are therefore quite local to Ile16, and one may therefore question whether they are sufficiently remote to constitute allostery. In any case, cleavage at residue 15 generally results in the ordering of the catalytic residues and the formation of the S1 pocket. How this is achieved is quite easy to envision. The hydrophobic side chains at 16 and 17 are driven out of the aqueous environment toward the cavity, while complementary electrostatics orient the side chain of Asp194 internally toward the pocket. As a consequence, Cys191 is repositioned, allowing the main chain of Asp189 to make a double hydrogen bond to residue 17. This event effectively anchors the catalytic loop from 194 to 189, to rigidify the oxyanion hole and form the S1 pocket (side chain of 189 forms the base of the pocket). Cys220 is linked to the active site loop through a disulfide bond with Cys191, and its repositioning engenders van der Waals’ contacts for residues 220 and 221A with Val17. The final zymogen loop, 141–152, is stabilized by direct main chain hydrogen bonds to the new N-terminus (142 and 143) (see Fig. 1C).
Protease-like zymogens and zymogen-like proteases
Most of the conformational changes I refer to above involve the stabilization of flexible regions critical for the catalytic function of the protease. This is because the zymogen loops in most structures are disordered or at least sample multiple conformations. It is therefore only a minor simplification to consider flexibility of the zymogen loops to be the general basis for zymogen inactivity. This conceptually decouples activation from cleavage of the Arg15–Ile16 peptide bond, as stabilization of these loops is all that is required. This is nicely illustrated by the activation of zymogens with strong ligands , high salt , and by staphylocoagulase . In addition, there are zymogens that exhibit appreciable catalytic activity in the absence of other activator molecules (protease-like zymogens), and proteases that exist in a catalytically inert state in spite of being cleaved at the Arg15–Ile16 bond (zymogen-like proteases). The best examples of a protease-like zymogen and a zymogen-like protease are single-chain tissue plasminogen activator (tPA)  and factor VIIa [17,18], respectively. The activity of the zymogen form of tPA has been shown to be in part due to the insertion of the side chain of Lys156 into the activation pocket [16,19], with its ammonium group forming an analogous salt-bridge to Asp194. However, it is still an effective enzyme when Lys156 is mutated, indicating that the zymogen activation loops are inherently more stable than for normal zymogens. Conversely, the activation loops are likely to be inherently more disordered in fVIIa, as its N-terminus is partially exposed unless the cofactor TF is bound. As a result of space constraints, I am not able to go into more detail about either of these fascinating and important enzymes, but the take-home message is that zymogenicity is related to the flexibility/stability of the activation loops.
‘Slow’ thrombin is zymogen-like
For about 30 years now, researchers have been studying the effect of Na+ binding on the activity and structure of thrombin (for a thorough review see Ref. 20). Thrombin is not unique among coagulation proteases in its ability to coordinate Na+ nor in expressing greater activity in its presence than in its absence (e.g., factors VIIa, Xa, IXa and APC ). However, Na+ is considered a cofactor for thrombin because its Kd at 37 °C is reported to be similar to the concentration of Na+ in the blood , and so the two forms should be equally populated in vivo. Na+ binding leads to a general improvement in catalytic efficiency (10-times faster cleavage of S2238), including against procoagulant substrates such as fibrinogen and PAR-1; it has therefore become known as the pro-coagulant ‘fast’ form. Conversely, Na+-free thrombin is a poor enzyme, except when bound to thrombomodulin (TM), which effectively alters the specificity from pro- to anticoagulant substrates; it is thus known as the anti-coagulant ‘slow’ form . This distinction appears to be supported by animal studies that show infused recombinant slow thrombin is indeed an anticoagulant, and that this effect is mediated by the protein C pathway . Thrombin coordinates Na+ using the main chain oxygens of Arg221A and Lys224 and four water molecules  (Fig. 3A). These and adjacent residues of the 220-loop contact the 186-loop, the 147-loop and Val17. In other words, Na+ binding would appear to influence the stability of the entire zymogen activation domain. The ‘slow/fast’ nomenclature is specific to thrombin, and this perhaps explains why efforts to determine the molecular basis behind the low activity apo state have ignored the obvious link to zymogenicity. Isn’t it possible that, similar to fVIIa, full conversion from the zymogen form of thrombin to its active form requires the additional step of cofactor binding (either Na+ or TM)? To answer this question, it is necessary to determine the structure of ‘slow’ thrombin.
Structural features of slow thrombin
Based on the assumption that the slow form was a distinct, well defined conformational state of thrombin, researchers have attempted for many years to determine its structure by X-ray crystallography. Zhang and Tulinksky  correctly identified the coordination site of Na+ by growing crystals in its presence and subsequently replacing it with the more electron-dense Rb+. As a natural corollary, they soaked out Na+ and determined the structure of what they hoped would be thrombin in the slow state. Although Na+ was missing from the structure, the conformation was identical to the Na+-bound state. They concluded that crystal contacts may stabilize the fast state, regardless of the occupancy of the Na+ binding site. Furthermore, the crystals were grown in the presence of the exosite I ligand hirugen, and it is known that exosite I binding has the same conformational effect as Na+ binding (i.e., stabilization of the fast conformation) . So, it was clear that obtaining a crystal structure of the slow form would require non-liganded thrombin and a modicum of good luck.
Mutagenesis studies were initially more successful in defining the regions involved in Na+ binding and the associated conformational change. I recently published a review of these studies, classifying the importance of the residues according to their effect on Na+ affinity . The residues of medium to extreme importance (0.5 log effects and above) are shown in Fig. 3B. With few exceptions, the residues involved in the Na+− activation of thrombin belong to the zymogen activation loops. This is perhaps unsurprising when considering the location of the Na+ binding site (Fig. 3B), but for exosite I ligands, this truly is a long-range allosteric effect.
Today the PDB contains several depositions of thrombin structures with ‘slow’ in the titles. I recently conducted a thorough analysis of the structures  and concluded that the two copies of Na+-free thrombin in 1MH0  and in 1SGI  (same crystal, but different resolution) are indistinguishable from acknowledged fast forms, whereas a second class represented by 2AFQ  (wild-type), 2GP9  (D102N), and 1RD3  (E217K) were structurally distinct from fast thrombin. When the regions that differ significantly (Cα RMSD signal to noise ratio above 2) in these structures are plotted (Fig. 3C), the correspondence with the mutagenesis data (Fig. 3B) and the zymogen structures (Fig. 2A) is strikingly clear. The altered conformation of the zymogen loops of the slow thrombins has the same functional effect seen for zymogens; namely, the blocking of the S1 pocket and the malformation of the oxyanion hole, in spite of having its N-terminal Ile16 stably incorporated in the activation pocket.
To some, it may seem odd for a 2009 State-of-the-Art article to review work that was carried out in the seventies, but the goal of this review is to place past and ongoing work on protease allostery (thrombin in particular) into the correct historical and scientific context. There are several examples of serine proteases in hemostasis that require cofactor binding to achieve full activation. In some cases, this is due to co-localization, while in others there is an accompanying allosteric activation event. To some degree, all coagulation proteases from fVIIa through to thrombin can be activated toward small substrates through either coordination of cations or macromolecular cofactor binding. Thus, the apo forms of these proteases have not completed the conversion to their fully active states and are therefore still zymogen-like to some degree. The issue of physiological relevance is decided by whether these low-activity states exist in and around the blood clot, or whether they are merely artefacts of contrived in vitro conditions.
I thank the Scientific Committee of ISTH 2009 for the invitation to contribute this manuscript. Funding for my work on thrombin is provided by the National Institutes of Health (HL68629), the British Heart Foundation (UK) and the Medical Research Council (UK).
Disclosure of Conflict of Interests
The author states that he has no conflict of interest.