Cysteine peptidases (CP) are ubiquitous enzymes that play fundamental roles in many cellular metabolic pathways.1 In mammalian cells, they are highly regulated in apoptotic pathways related to cancer and other severe disorders.2, 3 In bacterial cell division, cell-growth and lysis, peptidases are employed in the cleavage of peptidoglycans (autolysins).4, 5 As secreted antigens and toxins, CPs are key to virulence in Gram-positive pathogens6 and used to attack competing species in bacterial warfare.7 Viral genomes encode forms of the enzyme for the purpose of capsid trimming during maturation.8 CPs have been classified in clans based on their evolutionary relationships,9 in particular, the CHAP (cysteine, histidine-dependent amidohydrolases/peptidases) domain (pfam: PF05257, MEROPS ID: C51) is classified as an endopeptidase and part of the CA clan of peptidases (CL0125). This 27-member superfamily includes synthetase/amidases, peptidases, viral proteinases, the NPLC/P60 families, and other papain-related families.10 Pfam 22.0 lists 602 members for the CHAP domain family, including 475 from the bacterial kingdom (396 in the firmicutes phylum without any known 3D structure), 39 from eukaryota, and 81 from viruses and phage (predominantly involving Gram-positive phage). CHAP domains are often associated with other peptidases, bifunctional glutathionyl spermidine (GSP) amidases (Type 2 and 3), choline binding, and with SH3 and/or Von Willebrand (VWA) domains to form multidomain systems that act cooperatively as versatile machineries for murein septum processing while anchored to the cell surface. Deletion of the CHAP-containing cse gene in Streptococcus thermophilus results in impaired separation of cells during mitosis, demonstrating its involvement in cell division.11 The staphylococcal phage ϕ11 hydrolase, a CHAP domain-containing enzyme, exhibits D-alanyl-glycyl endopeptidase and N-acetylmuramyl-L-alanyl amidase activity.12
In this note, we present the solution NMR structure of CHAP domain encoded by gene SSP0609 of Staphylococcus saprophyticus [SWISS-PROT ID: Q49ZM2_STAS1; NESG target ID: SyR11]. S. saprophyticus, one of the three major human Gram-positive pathogens, possesses anchoring fimbriae that help colonize the urinary tract resulting in infection.13 Bacterial antigens are known virulence agents,7 and SSP0609 is a secreted antigen with potential roles in the onset of S. saprophyticus infection. The protein comprises a type-I signal peptide in the N-terminal region (res. 1–47) and a globular CHAP domain in the C-terminal region (res. 48–155). The SSP0609 CHAP domain was found to have a highly-conserved Cys-His-Glu-Asn proteolytic relay active site.
MATERIALS AND METHODS
Uniformly, 13C,15N- and 5%-13C, U-15N-enriched Staphylococcus saprophyticus SSP0609 samples were produced following standard protocols of the NESG consortium.14 A complete description of the molecular biology, protein purification, sample preparation, NMR data acquisition, analysis and structure calculation, and validation methods used in this work is given in the Supporting Information. The 16.98 kDa protein construct studied here includes the full-length protein sequence with a C-terminal affinity tag (LEHHHHHH), starting at position 156. Protein samples for NMR spectroscopy were concentrated from 0.7 to 0.9 mM in 95% H2O/5% D2O solution containing 20 mM MES, 100 mM NaCl, 10 mM DTT, 0.02% NaN3, and 5 mM CaCl2 at pH 6.5. NMR measurement of the T1 and T2 relaxation rates (τc = 7.1 ± 0.1 at 25°C) and gel-filtration chromatography with mass detection by static light scattering confirmed the monomeric state of SSP0609 (Supp. Info. Figs. S1 and S2). Triple resonance NMR data were collected at 25°C on Bruker AVANCE 600 and 800 MHz NMR spectrometers. Data analysis, including largely automated backbone assignment, manual sidechain assignments, and manual NOESY peak-picking from 15N and 13C-edited NOESY in H2O and D2O, provided peak list files necessary to run CYANA 2.1 and AutoStructure 2.1.1 (Supp. Info. Fig. S3).15–17 Following iterative cycles of noise/artifact peak removal and minor assignment modifications, a complete NOESY peak list set was obtained. This set was used in a final CYANA 2.1 run in which the resulting 20 lowest target function structures were further refined with constrained molecular dynamics in explicit water.18 The resulting final structures underwent extensive presubmission validation protocols.19, 20 Structural and biochemical characterization of the active site was carried out by NMR. The tautomeric state and the pKa of His109 were determined by 1H-15N HMQC [Fig. 2(A)]21 and by pH titration of the U-13C,15N SSP0609 while monitoring the changes in chemical shift by 2D NMR [Fig. 2(B) and further information in Supporting Information], respectively. Details of electrostatic calculations are outlined in the Supporting Information. Structural bioinformatics analysis of the resulting structure was facilitated using the Mark-Us server.22
RESULTS AND DISCUSSION
Structural statistics for the solution NMR structure of S. saprophyticus SSP0609 are listed in Supp. Info. Table S1; the structure shows excellent structure quality scores. In agreement with disorder prediction analysis,23 NMR shows the N-terminal portion of the sequence up to residue 49 to be unstructured in solution. A stereoview of the lowest energy structure representative from the final SSP0609 ensemble (residues 50–155) is shown in Figure 1(A). The SSP0609 CHAP domain features the typical ααβββββαβ papain topology. Two short alpha helices (α1, Cys57–Lys64; α2, Trp76–Ala86) packed against a six-strand beta sheet saddle (β1, Thr89–Asn91; β2, Ser98–Ser101; β3, Val110–Val116; β4, Val122–Glu127; β5, Ser136–Ile141; β6, Asn150–His153). Linking β5 to β6 is a short helical segment (α3, Ala143–Ala146) confirmed by 13C chemical shift and NOESY data. The β-sheet core is formed by four highly conserved hydrophobic residues: Pro94, Val110, Val113, and Val116. The fold, characteristic of the cysteine peptidase superfamily clan CA, as well as the location of the active site in the CHAP domain (described below), were accurately predicted.24
Analysis of the molecular surface of SSP0609 by ConSurf25 [Fig. 1(B)] shows seven highly conserved residues at or near the surface, Gln56, Cys57, Thr58, Gly74, Gly108, His109, Glu126, and Asn128, forming a very broad and shallow cavity. Residues Cys57, His109, and Glu126, highlighted in Figure 1(A), are arranged in the typical clan CA sequence order and topology. This residue triplet was shown to be a viable proteolytic triad found in pyroglutamyl peptidases, a family (C15) of otherwise unrelated cysteine peptidases (clan CF).26 In SSP0609, the extended charge transfer relay includes Asn128 as an additional stabilizing group. Cys57 is positioned at the beginning of helix α1, followed by His109 on strand β3, Glu126 at the end of strand β4, and Asn128 toward the BB4 loop (Supp. Info. Fig. S4). Electrostatic surface potential27 [Fig. 1(C) and Supp. Info. Fig. S5 and Table S2] images of SSP0609 show the active site cleft with several polar residues consistent with peptidase active sites. Overall, the molecule exhibits weak and diffused negative charge.
More evidence for the peptidase motif emerges from the structural and biochemical characterization of the SSP0609 active site [Fig. 2(A,B)]. The 1H-15N HMQC pattern clearly shows His109 in the neutral state. The structure quality is significantly improved upon by the introduction of the correct His109 tautomer in NMR structure-determination protocol. In the NMR conditions (pH 6.5), the resulting NOE-based models indicate the Cys57 Sγ and His109 Nδ1 to be within hydrogen bond distance (3.3 Å)[Fig. 2(C)]. The pKa of His109 was found to be 5.5 [Fig. 2(B)], which is indicative of a buried histidine in a highly cooperative hydrogen bond network and typical of such residues in proteolytic triads. By contrast, the surface exposed His153 sidechain (pKa = 7.4) and histidines in the purification tag (pKa = 6.2) are more basic and protonated (on Nδ1 and Nϵ2) at NMR conditions. The slightly elevated pKa of His153 is explained by the formation of a salt bridge between the positively charged imidazole Nδ1Hδ1and the negatively charged Glu95 sidechain Oϵ (His153 Nδ1- Glu95 Oϵ = 2.9 Å).
Presence of the substrate would simultaneously trigger the deprotonation of Cys57 Sγ by His109 Nδ1and Cys57 thiolate attack the C′ peptidyl leading to the tetrahedral anionic intermediate.28 The resulting charged species is sequestered by a pair of hydrogen bonds within the active site cavity (oxyanion hole). A variety of suitable hydrogen-bond donors in peptidases are known.29 In the structure of SSP0609, the Gln56 sidechain and/or the backbone NH groups of Gly74 and Gly108 are positioned to fulfill this role. These residues are highly conserved and likely essential for the enzymatic activity. Acquisition of Cys57 SγH by His109 establishes a positive charge on the imidazole ring, which is distributed to the carboxyl sidechain (Oϵ) of Glu126 via interaction with the His109 Hϵ2-Nϵ2 hydrogen-bond donor (His109 Nϵ2–Glu126 Oϵ2 = 3.2 Å) and further distributed to the Asn128 Nδ2 moiety (Glu126 Oϵ1–Asn128 Nδ2 = 4.1 Å).30 Although still well over 70% conserved, the Glu126 residue is the least conserved member of the Cys57-His109-Glu126-Asn128 quartet across the CHAP domain family. In firmicutes, the most common variant for Glu at position 126 is the chemically-similar Asp. Interestingly, Glu126Gly substitutions are found in eight pathogenic species, six of which are S. aureus strains. On the basis of the detailed knowledge of the active site gathered in this work, we postulate a substantial rearrangement of the active site in these limited number of firmicute variants.
In S. saprophyticus, the immediate vicinity of the active site lacks a set of large, cavity-defining hydrophobic residues that would confer specificity to the binding site. However, two exposed aromatic groups, Tyr107 and Tyr129, are positioned to possibly function as “rails” at either side of the active site and are likely to have substrate anchoring function. Specifically, Tyr107 is located on the β2–β3 (BB2) loop and in close proximity to Cys57 while Tyr129 is located on the β4–β5 (BB4) loop. Tyr107 is >90% conserved in Gram-positive firmicutes, and Tyr129 is highly conserved as an aromatic type (∼70/30 Tyr, Trp/Val) across the entire family.
A search for structurally similar proteins in the Protein Data Bank using the DALI31 server produces several hits of sizeable Z-score, but minimal aligned sequence identity (see full DALI report in Supp. Info. Table S3). These include several structures of proteins from the same peptidase superfamily including E. coli bifunctional glutathionyl spermidine (GSP) CHAP domains (PDB_IDs 2IOB, 2IO9; sequence_identity 17 and 18%, respectively),32A. variabilis NLP/P60 (PDB_ID 2HBW; sequence_identity 7%) and N. punctiforme COG0791 (PDB_ID 2EVR; sequence_identity 8%) cell wall-associated hydrolases, and E coli lipoprotein Spr33 (PDB_ID 2K1G; sequence_identity 9%). The novel modeling leverage for SSP0609 (PDB_ID: 2K3A) is 86 models (UniProt 12.8)34; that is, some 86 protein sequences can be homology modeled using the 3D structure of SSP0609 that could not be modeled using the structures available in the PDB on the date of its deposition.
Structure-based sequence alignment [Supp. Info. Fig. S4(A)] and overlay [Supp. Info. Fig. S4(B)] of SSP0609 with the E. coli GSP CHAP domain structure show a much more secluded cysteine active site in the GSP CHAP domain. Although the overall papain fold is identical, the sequence homology is low (18%), and several differences are present in the structure. Most notably, the longer BB2, BB4, and BB5 (substitutes α3 in 2IOB) loops surround the active site with nonpolar residues and produce a well defined, deep binding pocket. Also, in the E. coli GSP CHAP domain, the Asn128 equivalent lacks structural alignment with the SSP0609 protein discounting its active site involvement. These observations indicate a different substrate binding and scope for SSP0609. Although SSP0609's role as a proteolytic enzyme is clearly indicated by the 3D structure of its active site, characterization of the substrate and exact scope of this enzyme in the life cycle of S. saprophyticus awaits further structural and biochemical studies.
The authors thank Gaohua Liu and Alexander Eletski for helpful discussions. Resonance assignments, raw NMR data, and the 3D NOESY peak lists have been deposited in the BioMagResDB (BMRB ID: 15335), and atomic coordinates have been deposited into the Protein Data Bank (PDB_ID, 2K3A).