Serine proteases

Authors

  • Enrico Di Cera

    Corresponding author
    1. Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, Box 8231, St. Louis, MO, USA
    • Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine, Box 8231, St. Louis, MO 63110, USA
    Search for more papers by this author
    • Tel: +314-362-4185. Fax: +314-362-4133.


Abstract

Over one third of all known proteolytic enzymes are serine proteases. Among these, the trypsins underwent the most predominant genetic expansion yielding the enzymes responsible for digestion, blood coagulation, fibrinolysis, development, fertilization, apoptosis, and immunity. The success of this expansion resides in a highly efficient fold that couples catalysis and regulatory interactions. Added complexity comes from the recent observation of a significant conformational plasticity of the trypsin fold. A new paradigm emerges where two forms of the protease, E* and E, are in allosteric equilibrium and determine biological activity and specificity. © 2009 IUBMB IUBMB Life 61(5): 510–515, 2009

PREAMBLE

A typical genome contains 2–4% of genes encoding for proteolytic enzymes1. Among these, serine proteases emerged during evolution as the most abundant and functionally diverse group2, 3. Because abundance is a measure of success in evolutionary terms, the molecular mechanisms that ensure catalysis and regulation in these enzymes deserve attention. Much has been learned on how serine proteases catalyze the hydrolysis of a peptide bond and excellent reviews have covered this subject4, 5. Surprisingly, however, the rules encoding specificity in a given protease fold remain incompletely understood6. New developments from the structural investigation of a number of serine proteases have recently challenged old paradigms. Allostery, originally conceived to explain the properties of multisubunit enzymes7, emerges as a key embodiment of the serine protease fold especially evident in the trypsins. This new dimension changes our understanding of serine protease function, regulation, and specificity.

GENERAL PROPERTIES OF SERINE PROTEASES

Barrett and coworkers have devised a classification scheme based on statistically significant similarities in sequence and structure of all known proteolytic enzymes and term this database MEROPS8. The classification system divides proteases into clans based on catalytic mechanism and families on the basis of common ancestry. Over one third of all known proteolytic enzymes are serine proteases grouped into 13 clans and 40 families. The family name stems from the nucleophilic Ser in the enzyme active site, which attacks the carbonyl moiety of the substrate peptide bond to form an acyl-enzyme intermediate4. Nucleophilicity of the catalytic Ser is typically dependent on a catalytic triad of Asp, His, and Ser residues, commonly referred to as the charge relay system9. At least four distinct protein folds as illustrated by trypsin, subtilisin, prolyl oligopeptidase, and ClpP peptidase utilize the Asp-His-Ser catalytic triad in identical configuration to catalyze hydrolysis of peptide bonds2. This is testimony to the success of the triad as a catalytic machinery in four distinct evolutionary pathways. Many serine proteases employ a simpler dyad mechanism where Lys or His is paired with the catalytic Ser. Other serine proteases mediate catalysis via novel triads of residues, such as a pair of His residues combined with the nucleophilic Ser. A summary of catalytic units in all serine protease families is given in Table 1.

Table 1. General properties of serine proteases
ClanFamiliesRepresentative memberCatalytic residuesPrimary specificity
  1. Multiple members of each clan may give rise to a wide variety of primary specificities, as documented by the PA, SJ, and SQ clans.

PA12TrypsinHis, Asp, SerA, E, F, G, K, Q, R, W, Y
SB2SubtilisinAsp, His, SerF, W, Y
SC2Prolyl oligopeptidaseSer, Asp, HisG, P
SE6D-A, D-A carboxypeptidaseSer, LysD-A
SF3LexA peptidaseSer, Lys/HisA
SH2Cytomegalovirus assemblinHis, Ser, HisA
SJ1Lon peptidaseSer, LysK, L, M, R, S
SK2Clp peptidaseSer, His, AspA
SP3NucleoporinHis, SerF
SQ1Aminopeptidase DmpASerA, G, K, R
SR1LactoferrinLys, SerK, R
SS1L,D-carboxypeptidaseSer, Glu, HisK
ST5RhomboidHis, SerD, E

Serine proteases are widely distributed in nature and found in all kingdoms of cellular life as well as many viral genomes. However, significant differences exist in the distribution of each clan across species. For example, clan PA proteases are highly represented in eukaryotes, but rare constituents of prokaryotic and plant genomes. Vertebrates boast an array of clan PA proteases responsible for a variety of extracellular processes. SB and SC clans are most represented in other organisms. Serine proteases are usually endoproteases and catalyze bond hydrolysis in the middle of a polypeptide chain. However, several families of exoproteases have been described that remove one or more amino acids from the termini of target polypeptide chains. Within the metazoan lineage, a select subset of protease families underwent significant gene duplication and divergence3. In fact, four protease families by themselves account for over 40% of all proteolytic enzymes in humans. These are the ubiquitin-specific proteases responsible for regulated intracellular protein turnover10, the adamalysins controlling growth factors and integrin function11, prolyl oligopeptidases12, and the trypsin-like serine proteases2, which are also the largest group of homologous proteases in the human genome. Of 699 proteases in man, 178 are serine proteases and 138 of them belong to the S1 protease family. Abundance of S1 proteases suggests the protein fold presents a selective advantage relative to other proteases. The trypsin fold of the S1 protease family (Fig. 1) nestles catalytic efficiency, substrate selectivity, and multiple levels of regulation in a scaffold that is readily associated with other auxiliary protein modules. Because of the scopes of this review, we will limit our discussion to PA proteases and the trypsins in particular. Properties of the members of other clans listed in Table 1 are discussed in detail elsewhere2.

Figure 1.

Ribbon representation of trypsin (2PTN) (25), the prototypic example of serine protease from family S1 in clan PA (Table 1), colored according to the spectrum from the N-terminus (violet) to the C-terminus (red). The catalytic triad (sticks) is hosted at the interface of two similar β-barrels. D189 (sticks) occupies the bottom of the primary specificity pocket and confers specificity toward Arg or Lys side chain at the P1 position of substrate.

CLAN PA PROTEASES

Clan PA proteases bearing the trypsin fold are the largest family of serine proteases and perhaps the best studied group of enzymes. Digestive enzymes such as trypsin and chymotrypsin cleave polypeptide chains at positively charged (Arg/Lys) or large hydrophobic (Phe/Trp/Tyr) residues, respectively. Most clan PA proteases have trypsin-like substrate specificity and prefer Arg or Lys side chains at the P1 position of substrate13. A number of key biological processes rely on clan PA proteases. Chief among them are blood coagulation and the immune response, which involve cascades of sequential zymogen activation. In both systems, the protease domain is combined with one or more apple, CUB, EGF, fibronectin, kringle, sushi, and von Willebrand factor domains. These domains are present on the N-terminus as an extension of the propeptide segment of the protease and typically remain attached to the protease domain through a covalent disulphide bond.

Family S1 of clan PA is partitioned into two subfamilies, S1A and S1B, that are phylogenetically distinct but share a common two β-barrel architecture. The S1B proteases are found in all cellular life and are responsible for intracellular protein turnover. In contrast, the S1A proteases are the trypsins that mediate a variety of extracellular processes. S1A proteases have a limited distribution in plants, prokaryotes, and the archaea. In humans, they are phylogenetically grouped into six functional categories: digestion, coagulation and immunity, tryptase, matriptase, kallikrein, and granzymes. A variety of enzymes are involved in the breakdown of proteins in the digestive system14. Trypsins, chymotrypsins, and elastases are endoproteases that breakdown polypeptides into shorter chains. Further digestion is mediated by a number of exoproteases15, 16. Tryptases are major components in the secretory granules of mast cells that are unique among clan PA proteases because of their homotetrameric structure17. Matriptases are membrane bound S1 proteases bearing primary substrate selectivity similar to trypsin18. Physiological substrates of this subfamily of proteases are largely unknown, but high gene expression levels for matriptases are associated with a variety of cancers19, 20. Similar association with cancer has led to great interest in kallikreins21, a large family better known for its role in regulation of blood pressure through the kinin system22. Granzymes are mediators of directed apoptosis by natural killer cells and cytotoxic T cells that play key roles in the defense against viral infection23. These enzymes are unique in their primary specificity for acidic residues.

THE CATALYTIC MECHANISM

Activation of many trypsin-like serine proteases requires proteolytic processing of an inactive zymogen precursor24. Cleavage of the proprotein precursor occurs at the identical position in all known members of the family, that is, between residues 15 and 16 (chymotrypsinogen numbering). The nascent N-terminus induces conformational change in the enzyme through formation of an ion-pair with the highly conserved D194 that organizes both the oxyanion hole and substrate-binding site25, 26. Alternatively, proteases like tissue-type plasminogen activator carry Lys at position 156 that engages D194 with an ion-pair that confers a catalytically competent fold without proteolytic cleavage at residue 1527.

Nearly all clan PA proteases utilize the canonical catalytic triad and hydrolyze the peptide bond via two tetrahedral intermediates28. Initially, the hydroxyl O atom of the catalytic S195 attacks the carbonyl of the substrate peptide bond utilizing H57 as a general base. The oxyanion tetrahedral intermediate is stabilized by the backbone N atoms of G193 and S195, which together generate a positively charged pocket within the active site known as the oxyanion hole. H-bonding interactions in the oxyanion hole contribute 1.5–3.0 kcal/mol to ground and transition state stabilization29. Collapse of the tetrahedral intermediate generates the acyl-enzyme intermediate and stabilization of the newly created N-terminus is mediated by H57. A water molecule displaces the free polypeptide fragment and attacks the acyl-enzyme intermediate. The oxyanion hole stabilizes the second tetrahedral intermediate of the pathway and collapse of this intermediate generates a new C-terminus in the substrate4.

PRIMARY SPECIFICITY

The trypsin fold is remarkable for the even distribution of catalytic residues across the entire polypeptide sequence (Fig. 1). Two six-stranded β-barrels come together asymmetrically to host at their interface the residues of the catalytic triad. H57 and D102 belong to the N-terminal β-barrel with S195 and the oxyanion hole (S195 and G193) hosted by the C-terminal β-barrel. Eight surface exposed loops in the protein decorate the entrance to the active site. Within the same fold, trypsins prefer Arg/Lys side chains at the P1 position of substrate13 but chymotrypsins prefer Phe/Tyr/Trp residues2. The molecular origin of this difference resides in part in the architecture of the primary specificity pocket, shaped as a cavity about 10 Å deep and connected to the active site, at the bottom of which trypsins carry D189 and chymotrypsins carry S189. The simplicity of this arrangement belies the difficulty of swapping specificity between trypsin and chymotrypsin. Pioneering studies have shown that conversion of trypsin into chymotrypsin requires the expected D189S replacement, but also extensive additional replacements at loops not directly in contact with substrate30. Despite these replacements, however, the conversion is never complete and enzyme activity in the mutant constructs remains poor. Furthermore, the same strategy fails in the reverse conversion of chymotrypsin into trypsin31 or the conversion of trypsin into elastase32. Recent mutagenesis studies have shown that the simple S189D replacement in chymotrypsin is sufficient to confer trypsin-like specificity when access to the primary specificity pocket is enhanced with the A226G substitution33, a result that echoes the impressive recent successes in the redesign of primary specificity in the different protein scaffold of OmpT34. Yet, the S189D/A226G mutant of chymotrypsin remains a poor protease, 5,000-fold slower than trypsin at cleaving substrates with Lys at the P1 position. Apparently, the structure-function link required to understand the molecular basis of serine protease specificity is missing a key element. The considerable conformational plasticity of the trypsin fold may hold important clues.

THE E* FORM

Recent kinetic studies on thrombin35, the key serine protease of the blood coagulation cascade, vouch for an unexpected plasticity of the trypsin fold. Thrombin exists in three forms at equilibrium under physiological conditions. The Na+-free form E and the Na+-bound form E:Na+ are the low and high activity conformations of the enzyme, with E:Na+ being responsible for the procoagulant, prothrombotic, and signaling functions. A third form, E*, is in equilibrium with E and is basically inactive toward substrate and unable to bind Na+36. The structural differences between E and E:Na+ are subtle37, almost disappointingly small considered the significant functional differences observed between the two forms in solution. However, enzymologists have long recognized that subtle short-scale rearrangements within the active site of an enzyme may lead to profound energetic changes in the ground and transition states38, 39. On the other hand, E* differs dramatically from E and E:Na+ in its collapse of the 215–217 β-strand into the active site, a disruption of the oxyanion hole due to a flip of the peptide bond between E192 and G193 from a type II to a type I β-turn, and other changes around D189 and the Na+ site40 (Fig. 2). Importantly, the ion-pair between I16 and D194 remains intact, suggesting that E* is not equivalent to the zymogen form of thrombin. Existing structures of the zymogen forms of trypsin26, chymotrypsin41, and chymase42 all feature a broken I16-D194 ion-pair and do not document a collapse of the 215–217 β-strand. Stopped-flow experiments demonstrate that the E*–E conversion takes place on a time scale <10 msec36, as opposed to the zymogen-protease conversion that evolves over a much longer (100–1,000 msec) time scale43, 44.

Figure 2.

A: The E* form shows a collapse of the 215–217 β-strand into the active site. The surface (wheat) refers to trypsin oriented as in Fig. 1. Sticks refer to the superimposed structures of thrombin (3BEI, green) (45), αI-tryptase (1LTO, cyan) (46), and prostasin (3DFJ, yellow) (47). B: Details of the collapse that completely occludes access to the active site. Black labels refer to trypsin. White labels indicate the positions of W215 and E217 in the E* form of thrombin.

E* is not a peculiarity of thrombin. Some of the most striking features of the E* conformation, like the collapse of the 215–217 β-strand into the active site and disruption of the oxyanion hole, have been observed in other structures of serine proteases in the free form (Fig. 2). In the inactive form of αI-tryptase, D216 collapses into the oxyanion hole46. A collapsed 215–217 segment is observed in the structures of the high-temperature-requirement-like protease48, complement factor D49, granzyme K50, hepatocyte growth factor activator51, prostate kallikrein52, and prostasin47. A disrupted oxyanion hole is observed in complement factor B53 and the arterivirus protease nsp454. In all these cases the I16-D194 ion-pair remains intact, and therefore the conversion of E* into the active form E of the protease cannot involve the canonical zymogen-protease conversion or a cofactor-induced zymogen activation as observed for the prothrombin-staphylocoagulase interaction55. The conformational transition of E* into E must follow other routes, as demonstrated recently for thrombin in unprecedented detail45. Binding of ligands to E* away from the active site region organizes a network of H-bonds that induces a long-range rearrangement of the 215–217 β-strand and a flip of the oxyanion hole back into the active incarnations of the E form45.

ALLOSTERY AND SERINE PROTEASES

All these recent observations support E* as an inactive form of the protease in allosteric equilibrium with the active form E, regardless of the ability of the protease to bind Na+ and further convert E into E:Na+. In other words, serine proteases are allosteric enzymes whose activity can be modulated by affecting the E*–E equilibrium. Allostery is often associated with large-scale conformational transitions in multimeric proteins like hemoglobin56, aspartate transcarbamylase57, the nicotinic receptor58, or GroEL59. However, the ability of monomeric enzymes to express complex kinetic behavior like hysteresis60, cooperativity61, and allosteric regulation62 has long been recognized. The molecular plasticity documented in monomeric proteins like serine proteases is further testimony to the importance of allostery as an intrinsic property of all dynamic proteins closely intertwined with catalysis63. This plasticity undoubtedly dictates many aspects of protease function, regulation, and even substrate specificity35, 64, 65.

What is the physiological advantage of the E*–E equilibrium? For each protease within its biological niche activity can be fine tuned by properly setting the E*–E equilibrium. In the case of complement factors, kallikreins, tryptase, and some coagulation factors activity must be kept to a minimun until binding of a trigger factor ensues. Stabilization of E* may afford a “resting” state of the protease waiting for action, a strategy implemented by other enzymes66–68. The role of E* can be exploited in engineering studies aimed at stabilizing a protease in the inactive form until a cofactor unravels its full catalytic power. In fact, thrombin has been redesigned to display activity only in the presence of thrombomodulin and protein C along the anticoagulant pathway69, 70. Notably, the structures of two anticoagulant variants have been solved71, 72 and show collapse of the 215–217 β-strand and disruption of the oxyanion hole as seen in E*40. These are exciting new developments that challenge old paradigms about the structural rigidity of the trypsin fold and the molecular basis of specificity. Future work on the properties of E* in serine proteases promises to be ground-breaking.

Acknowledgements

This work was supported by NIH research grants HL49413, HL58141, and HL73813.

Ancillary