We acknowledge funding from the DOE “Genomics to Life” program (grant no. DE-FG02-04ER63786). Use of the Advanced Photon Source (Argonne, IL) was supported by the U.S. Department of Energy, Basic Energy Sciences, Office of Science, under contract no. W-31-109-Eng-38. Use of the GM/CA-CAT beamline has been funded in whole or in part with Federal funds from the National Cancer Instutute (Y1-CO-1020) and the National Institute of General Medical Sciences (Y1-GM-1104). We acknowledge the help and advice from Dr. Valentina A. Terechko (University of Chicago) on X-ray structure refinement.
Convergent Chemical Synthesis and Crystal Structure of a 203 Amino Acid “Covalent Dimer” HIV-1 Protease Enzyme Molecule†
Article first published online: 24 JAN 2007
Copyright © 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Angewandte Chemie International Edition
Volume 46, Issue 10, pages 1667–1670, February 26, 2007
How to Cite
Torbeev, Vladimir Yu. and Kent, Stephen B. H. (2007), Convergent Chemical Synthesis and Crystal Structure of a 203 Amino Acid “Covalent Dimer” HIV-1 Protease Enzyme Molecule. Angew. Chem. Int. Ed., 46: 1667–1670. doi: 10.1002/anie.200604087
- Issue published online: 19 FEB 2007
- Article first published online: 24 JAN 2007
- Manuscript Revised: 4 DEC 2006
- Manuscript Received: 4 OCT 2006
- DOE. Grant Number: DE-FG02-04ER63786
- U.S. Department of Energy, Basic Energy Sciences, Office of Science. Grant Number: W-31-109-Eng-38
- National Cancer Instutute. Grant Number: Y1-CO-1020
- National Institute of General Medical Sciences. Grant Number: Y1-GM-1104
- convergent synthesis;
- native chemical ligation;
- protein synthesis
The total chemical synthesis of proteins with sizes larger than about 15 kDa is still a challenging task, even when utilizing modern methods for the ligation of unprotected peptides.1 The most effective ligation chemistry is the thioester-mediated amide-forming reaction at Cys residues (“native chemical ligation”),2 and peptides are typically ligated sequentially in the C-to-N terminal direction.3, 4 As a consequence of handling and other losses, synthesis by sequential reactions is inefficient (even in the case of “one-pot” ligations4) and consequently the yield of the final polypeptide is low. Recent advances in convergent methodology for the total chemical synthesis of proteins have been proposed to improve the situation.5
In our recently reported “kinetically controlled ligation” strategy,5 the peptide 1-(αthioarylester) selectively reacts with a Cys-peptide 2-(αthioalkylester)—in the absence of added thiol—to form the peptide 1-peptide 2-(αthioalkylester) product in high yield, because of the higher intrinsic reactivity of αthioarylesters. This simple concept has resulted in two important implications. First, synthesis (including sequential ligation) from the N-terminal segment towards the C-terminal segment has become possible. Second, two large polypeptides can be assembled in this way, one having a thioester moiety on the C terminus and the other one having a Cys residue on the N terminus. Native chemical ligation of these two large polypeptides at the final stage of the synthesis constitutes a fully convergent approach to the total synthesis of proteins.
We are undertaking detailed studies of the enzymatic mechanism of HIV-1 protease, one of the targets in the therapeutic treatment of AIDS.6 In its native form, the HIV-1 protease enzyme molecule is a homodimer of two polypeptide chains each containing 99 amino acid residues and with a single active site formed at the dimer interface.7 The chemical analogues we are constructing to investigate the catalytic mechanism will incorporate different functionalities in the polypeptide chains of the two monomers. To enable nonsymmetric incorporation of functionalities (or labels), the two 99-residue monomers have to be covalently joined through a short linker. Previous approaches to covalent linking have included recombinant expression of polypeptides of approximately 210 residues,8 or have employed a synthetic strategy involving the directed formation of a disulfide bond between the two chains.9 Although enzymes made in this way have led to insights about the catalytic mechanism,9b the overall synthesis is inefficient and, thus, a more robust synthetic route was required for further work. Herein we report the convergent chemical synthesis of a polypeptide chain with 203 amino acids10 from four peptide segments. We demonstrate the full catalytic activity of the resulting enzyme molecule and report its high-resolution X-ray structure.
The first step in the convergent synthesis of the target polypeptide (Scheme 1) 1is the kinetically controlled ligation of the two peptide segments (A1-A40)-(αthioarylester) (1) and Cys-(A42-A99)-(αthioalkylester) (2). Segment 1 was obtained after transthioesterification of (A1-A40)-(αthioalkylester) with an excess of 4-mercaptophenylacetic acid.11 Ligation was performed at pH 6.3 to slow down all the reactions and thus get better overall control. Two main by-products were present in the reaction mixture (Figure 1, 1Scheme 2). 2One is the branched thioester 7, formed by reaction of the ligation product (A1-CysA41-A99)-(αthioalkylester) (3) with 1. The second is the internal thiolactone 8, formed from intramolecular transthioesterification of the ligation product. After an empirically determined optimal reaction time of one hour, excess 4-mercaptophenyl acetic acid was added to give a total concentration of 200 mM at pH 6.0; this leads to breakdown of the branched thioester 7, thereby releasing more of the ligation product 3 and regenerating starting peptide 1, which can further ligate with any remaining 2. Moreover, both the internal thiolactone 8 and the ligation product 3 undergo transthioesterification to form the desired ligation product 4. The sulfhydryl functionality of CysA41 was subsequently capped with 2-bromoacetamide to form ψ-GlnA41 at the ligation site.
The segment Cys-Gly4-(B1-B99) was synthesized by conventional native chemical ligation.2 Two peptides Thz-Gly4-(B1-B40)-(αthioalkylester) and Cys-(B42-B99) were ligated at pH 7.0 using 4-mercaptophenylacetic acid as a catalyst.11 Residue CysB41 at the ligation site was then alkylated with 2-bromoacetamide and the ligation product treated with MeONH2⋅HCl to convert the N-terminal thiazolidine into a Cys residue.
The purified segments (A1-A99)-(αthioarylester) and Cys-Gly4-(B1-B99) were then joined together by native chemical ligation to form a final polypeptide chain consisting of 203 amino acids (Figure 2). 2The Cys residue at the final ligation site was converted into ψ-Gln by treatment with 2-bromoacetamide. After removal of the formyl protecting groups from the tryptophan residues,12 the product was purified by reversed-phase HPLC (RP-HPLC; 6.7 % overall yield of isolated product based on the limiting peptide segment). The 203-residue synthetic polypeptide was characterized by LC-MS, and further analyzed by Fourier-transform ion cyclotron resonance mass spectrometry (FT-ICR-MS) (Figure 3). 3Within the limits of experimental certainty, the product had the expected mass (found 21 869.8±0.4 Da; calcd 21 869.8 Da, average isotope composition).
The synthetic polypeptide was folded by two-step dialysis against acetate buffer at pH 5.6 (29 % yield). A standard fluorogenic assay of the enzymatic activity was performed in 50 mM NaOAc and 0.2 M NaCl at pH 5.6 and 37 °C with Abz-Thr-Ile-Nle-Phe(p-NO2)-Gln-Arg.amide (Abz=2-aminobenzoyl).13 The kcat and Km values of 10.3±0.2 s−1 and 27±1.4 μM, respectively (see the Supporting Information), are in agreement with previously reported data recorded under similar assay conditions.14 As a control, a chemically synthesized homodimeric HIV-1 protease (that is, 2×99 residues) was assayed under the same conditions (kcat=9.8±0.2 s−1, Km=25±1.4 μM).
X-ray structural analysis was performed to verify if the synthetic protein had the correct three-dimensional fold of the HIV-1 protease covalent dimer enzyme. Crystals grown in the presence of the inhibitor MVT-101 (Ac-Thr-Ile-Nle-ψ-(CH2NH)-Nle-Gln-Arg.amide) were isomorphous to those of previously reported synthetic and recombinantly expressed HIV-1 proteases, and diffracted to a resolution of 1.65 Å. The X-ray structure of the protein molecule with 203 amino acids (Figure 4) 4was found to be essentially identical to the previously reported structures of homodimeric HIV-1 protease,15 as well as to those of recombinantly expressed tethered dimers of HIV-1 protease,16 with the linker region being partially disordered.
This 21 870 Da protein with full enzymatic activity and correct three-dimensional structure is, to the best of our knowledge, the largest linear polypeptide chain prepared to date by chemical synthesis. The total synthesis of a protein of this size, in a straightforward fashion, demonstrates the great potential of recently developed methods for the fully convergent chemical synthesis of proteins. Facile synthetic access to the 203-residue “covalent dimer” HIV-1 protease will enable the preparation of a wide range of unique chemical analogues to systematically dissect the molecular basis of the function of this important enzyme.
In a kinetically controlled ligation, (A1-A40)-αCOSC6H4CH2COOH (8.2 mg, 1.8 μmol) and (CysA41-A99)-αCOSCH2CH2Arg4 (13.5 mg, 1.9 μmol) were dissolved in aqueous buffer (1.46 mL) containing 6 M Gn⋅HCl, 0.2 M Na2HPO4, and 19 mM TCEP at pH 6.3. After 1 h 4-mercaptophenylacetic acid was added to give a total concentration of about 200 mM and the pH value was adjusted to 6.0. After 3 h, 2-bromoacetamide (52 mg, 0.377 mmol) was added and the pH value adjusted to 6.7. After 15 min, 4-mercaptophenylacetic acid (51 mg, 0.304 mmol) was added to neutralize the excess of 2-bromoacetamide. The product was purified by RP-HPLC with a shallow gradient of water/acetonitrile with 0.1 % trifluoroacetic acid (TFA). LC-MS: found: 10 956±0.8 Da, calcd: 10 955.9 Da (Figure 1). 1Yield of isolated product 5.7 mg (0.52 μmol, 29 %). For the synthesis of Cys-Gly4-(B1-B99) see the Supporting Information.
In the final native chemical ligation, (A1-A99)-αCOSC6H4CH2COOH (4.8 mg, 0.44 μmol) and Cys-Gly4-(B1-B99) (5.5 mg, 0.49 μmol) were dissolved in buffer (1.6mL) containing 8 M Gn⋅HCl, 0.1 M Na2HPO4, and 25 mM TCEP. 4-mercaptophenylacetic acid was added to give a concentration of 50 mM and the pH value was adjusted to 7.0. After 12 h, the reaction mixture was diluted with buffer (1 mL), and 2-bromoacetamide (100 mg, 0.72 mmol) was added at pH 6.7. After 15 min, the reaction was quenched with an excess of 4-mercaptophenylacetic acid. Deformylation was performed by treatment with a mixture of 2-mercaptoethanol and piperidine (1:1 (v/v), 3.6 mL) on ice for 15 min, and then neutralizing with HCl. The reaction mixture was diluted twofold with buffer (6 M Gn⋅HCl, 0.1 M Na2HPO4) and purified by RP-HPLC. LC-MS: found: 21 869.8±0.4 Da, calcd: 21 869.8 Da. Yield of isolated product 2.2 mg (0.1 μmol, 23 %). The overall yield based on the limiting peptide segment is 6.7 %.
For more experimental details see the Supporting Information.
- 10Sequence of the tethered construct of HIV-1 protease (from the N to the C terminus): PQITLWKRPLA10 VTIRIGGQLKA20 EALLDTGADDA30 TVIEENleNLPGA40 ψ-GlnWKPKNleIGGIA50 GGFIKVRQYDA60 QIPVEIAbuGHKA70 AIGTVLVGPTA80 PVNIIGRNLLA90 TQIGAbuTLNFA99 ψ-Gln201GGGG205 PQITLWKRPLB10 VTIRIGGQLKB20 EALLDTGADDB30 TVIEENleNLPGB40 ψ-GlnWKPKNleIGGIB50 GGFIKVRQYDB60 QIPVEIAbuGHKB70 AIGTVLVGPTB80 PVNIIGRNLLB90 TQIGAbuTLNFB99. Unnatural amino acids are shown in italics in three-letter code. Nle=norleucine, Abu=α-aminobutyric acid, ψ-Gln=pseudo-homoglutamine. Residues from the 99-residue section at the N terminus (part A) are specified by the letter “A” placed before the number of the residue, correspondingly the 99-residue part at the C terminus (part B) has the letter “B” before the number. The five amino acid linker region is numbered from 201 to 205. Ligation sites are underlined.
- 12Four tryptophan residues are present: WA6, WA42, WB6, and WB42.
Supporting information for this article is available on the WWW under http://www.wiley-vch.de/contents/jc_2002/2007/z604087_s.pdf or from the author.
Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.