Crystal structure of an alkaline serine protease from Nesterenkonia sp. defines a novel family of secreted bacterial proteases


  • Na Yang,

    1. Shenzhen Graduate School of Peking University, Shenzhen 518055, China
    Search for more papers by this author
  • Jie Nan,

    1. National Laboratory of Protein Engineering and Plant Genetic Engineering, Peking University, Beijing 100871, China
    Search for more papers by this author
  • Erik Brostromer,

    1. Shenzhen Graduate School of Peking University, Shenzhen 518055, China
    2. National Laboratory of Protein Engineering and Plant Genetic Engineering, Peking University, Beijing 100871, China
    Search for more papers by this author
  • Rajni Hatti-Kaul,

    1. Department of Biotechnology, Center for Chemistry and Chemical Engineering, Lund University, S-221 00 Lund, Sweden
    Search for more papers by this author
  • Xiao-Dong Su

    Corresponding author
    1. Shenzhen Graduate School of Peking University, Shenzhen 518055, China
    2. National Laboratory of Protein Engineering and Plant Genetic Engineering, Peking University, Beijing 100871, China
    • College of Life Sciences Peking University, 100871 Beijing, China
    Search for more papers by this author


A wide range of bacteria secrete proteases into their extracellular environment for various purposes, such as degrading extracellular proteins for facilitating nutrient transport or effecting bacterial virulence and toxicity.1 A bacterial secreted protease is normally composed of a secretion signal peptide, a propeptide which will be cleaved upon the protease activation, and a mature secreted protease. The mature protease is the functional enzyme and can be isolated and characterized from the extracellular medium.

This article concerns an endoproteinase secreted by an alkaliphilic and moderately halophilic microbe belonging to the Nesterenkonia abyssinica family (originally named as Nesterenkonia sp. AL20). This protease, designated as NAALP (Nesterenkonia abyssinica alkaline protease), was isolated from an alkaline soda lake in the East African rift valley.2, 3 The bacterium AL20 grows well with chicken feather as nutrient source, and the NAALP has shown good activity towards casein and hemoglobin as substrates in vitro, with sequence preference in the order of Tyr > Phe > Leu at the P1 site.4 Although activity profiles of the NAALP suggested that the enzyme to be a subtilisin-like protease, its activity and stability were calcium independent. The NAALP is optimally active at pH 10, 1.0M NaCl, and 70°C and shows good stability at 50°C in the presence of EDTA and detergents.5

With the information of hundreds of bacterial genomes available in the postgenomic era, thousands of novel proteins, annotated as open reading frames (ORFs), have been identified without biochemical characterization. Sequence searches using the NAALP as probe have revealed dozens of homologues in the sequenced bacterial genomes, the overwhelming majority of which are uncharacterized putative proteins, thus the NAALP has defined a novel protein family (defined by sequence identity over 30% to ensure the same structural fold) of bacterial secreted proteases.

In this report, using high resolution crystal structure determination, we have unambiguously characterized the NAALP and its sequence related family as a trypsin-like serine protease.


Identification of the NAALP family

The sequence of the mature enzyme of NAALP was used to search the European Bioinformatics Institute (EBI) UniProt Knowledgebase at the website:

Representative homologous sequences from FASTA searched results were selected and aligned with the program CLUSTALX.6

Protein structure determination and refinement

The protein preparation, crystallization, and diffraction data collection have been described.7 There are two molecules per asymmetric unit, and a two-fold noncrystallographic symmetry (NCS) was revealed by self-rotational analysis. The crystal structure determination was carried out by molecular replacement (MR) method using the program MolRep in the CCP4 package.8 The crystal structure of a glutamyl endopeptidase (with a sequence identity of 22%, the closest homolog of NAALP could be found in the PDB database) from Bacillus intermedius (PDB ID: 1P3E) was used as the searching model.9 Based on the sequence alignment, several different constructs of 1P3E were prepared for MR. For each construct, poly-alanine, poly-serine, and partial mutation models were tested with MolRep by exactly the same protocol. Self-rotation results were input for MR search in the range of 20–3 Å. The final solution was determined with the poly-serine model of residues 20–215 from 1P3E_chain A. All the top 30 rotation peaks were used for translation searches. The solution for one molecule was solved from the first rotation peak, which was confirmed by a quite sharp translation peak (TF/sig value 6.7, whereas the following peaks were around 3.7). The position of the second molecule was searched by fixing the first one, and the resulted dimer was subjected to refinement.

After rigid body refinement by MolRep, the program ARP/wARP was used for further refinement and automatic model tracing.10 Refinement of the high resolution (1.39 Å) structure was carried out with the program Refmac5 combined with manually rebuilding of the model by the graphical program COOT.11, 12 Stereochemistry qualities of the model were evaluated and checked by PROCHECK.13 The data collection and structure refinement statistics were listed in Table I. Structure factors and the coordinates have been deposited in the PDB (PDB ID: 3CP7).

Table I. Refinement Statistics and Model Quality
  1. Values in parentheses are for outer (highest) resolution shell.

Data collection
 Space groupR3
 Unit cell parameters (Å)a = b = 92.26, c = 137.88, α = β = 90°, γ = 120°
 No. of reflections observed283010
 No. of Unique reflections86325
 Resolution range (Å)30.0−1.39 (1.44−1.39)
 Rmerge (%) (last shell)0.047 (0.193)
B factor from Wilson plot (Å2)18.16
 Reflections used in refinement84142
 Rcryst/Rfree (%)17.8/19.9
 No.of Mol./asymmetry unit2
 No. of non-H atoms3871
 No. of solvent molecules657 H2O + 2 FMT
 rmsd of bond lengths (Å)0.009
 rmsd of bond angles (°)1.3
Averaged B-factors (Å2) 
 Protein atomsMonomer A: 11.9
Monomer B: 12.8
 FMT atoms12.4
 Water molecules26.2
Ramachadran plot, residues in 
 Core region (%)82.9
 Additional allowed region (%)16.2
 Genenrally allowed region (%)0.9
 Disallowed region (%)0.0


NAALP has defined a novel family of bacterial secreted serine proteases

Among about 100 returned search results, more than 95% sequences were previously uncharacterized putative proteins or ORFs, we have used the criteria of 30% sequence identity to define the NAALP-like family, about 30 sequences were selected and all of them were annotated as putative proteins without any biochemical characterization except for NAALP. Hence, this NAALP-like family belongs to a novel family of bacterial secreted serine proteases. Thirteen sequences (including NAALP itself) were selected as representatives of this NAALP-like family to do a structure-based multiple sequence alignment by CLUSTALX as shown in Figure. 1(A). The selected protein sequences are from the following bacterium organism sources:

A0JTW5_ARTS2: Arthrobacter sp.; A1R423_ARTAT: Arthrobacter aurescens; A8L163_9ACTO: Frankia sp. EAN1pec; Q2J634_FRASC: Frankia sp.; Q47RY4_THEFY: Thermobifida fusca; Q8EM66_OCEIH: Oceanobacillus iheyensis; Q6AG07_LEIXX: Leifsonia xyli; Q2MG24_ MICEC: Micromonospora echinospora; A8M1I5_SALAI: Salinispora arenicola; A0LN25_SYNFM: Syntrophobacter fumaroxidans; A6W7 × 5_KINRD: Kineococcus radiotolerans; A8CXG6_9CHLR: Dehalococcoides sp.

Figure 1(A) has clearly shown that all the structural elements, and functional important residues including the active site triad (S169, H41 and D91 as numbered in NAALPs) labeled by filled squares, the oxyanion hole labeled by filled rings, and the two pairs of intramolecular disulfide bridges, C23-C42, C144-C162 are very well conserved in the NAALP-like family.

Figure 1.

(A) Structure-based multiple sequence alignment of the NAALP family with representative members selected, see text for detailed names of each microorganism. The active site triad (S169, H41, and D91 as numbered in NAALPs) is labeled by filled squares, the oxyanion hole labeled by filled rings, and the two pairs of intramolecular disulfide bridges, C23-C42, C144-C162 were labeled by green numbers. The figure was produced by ESPript 2.2 ( (B) Topology diagram of the NAALP structure colored according to the secondary structure elements and different domains, with all helices red, and β strands green (domain I) and yellow (domain II). (C) Cα trace of the NAALP dimer showing Mol A in yellow and Mol B in green, helices α1 and α3 forming the dimer interfaces predominantly. (D) Detailed Cα trace of Mol A of Figure 1D with the secondary structure elements numbered, the active site triad labeled as S169, H41, and D91 and the two pairs of intramolecular disulfide bridges labeled as C23-C42, C144-C162, with C atoms in yellow, N atoms in blue, O atoms in red, and S atoms in green. (E) 2Fo-Fc density maps of the active site of molecule A at 1.0 σ. The active triad is labeled as His41, Asp91, and Ser169, with C atoms in green, N atoms in blue, O atoms in red, and density map in light blue. The oxyanion hole formed by the main-chain NH groups of Gly167 and Ser169. A formic acid molecule is in the active sites, with C atom in yellow and O atoms in red.

Overall structure of NAALP

The final model of NAALP is refined to 1.39 Å with R-factor and freeR-factor of 17.8% and 19.9%, respectively. Refinement statistics and model quality of NAALP are listed in Table I. The overall structure of NAALP contains two molecules in one asymmetry unit [Fig. 1(B), labeled as Mol A and Mol B], each molecule is very similar in three-dimensional structure and adopts a typical trypsin-like fold, consisting of two lobes, each formed by a six-stranded β-barrel. In addition to its beta protein features, NAALP also contains two short alpha helices, a longer α helix, α3, and three turns at the C-terminus [Fig. 1(B,C)]. In the crystal lattice, two molecules of NAALP pack together to form a crystallographic dimer with an interacting surface of about 880 Å2 which is in the range of a weak protein dimer (the interacting surface area of NAALP dimer was calculated at the website: The dimer interfaces are formed mainly by the α1 and α3 helices from both monomers packing against each other with predominantly H-bonds, salt bridges, and hydrophobic interactions [Fig. 1(B)].

In the previous studies, because the full-length sequence was not available, NAALP was identified as a subtilisin-like family of serine protease mainly due to biochemical features.2, 4 With further bioinformatics annotations after the structure was available, the NAALP has been assigned as trypsin-like from the CATH Protein Structure Classification database ( and Pfam Protein Families database ( Furthermore, our structural results have undoubtedly shown that the NAALP is very similar to the trypsin in three-dimensional structure and all the functional elements for a trypsin-like serine protease are completely conserved, therefore the NAALP-like family has a trypsin-like structure and function.

The active sites of NAALP

The active sites in both molecules of NAALP are very similar and readily identified as in the active form with the intact catalytic triad and oxyanion hole shown, labeled as S169, H41, D91 depicted in Figure 1(D,E). A formic acid molecule (existing in 2.9M sodium formate in the crystallization buffer) has been observed in both active sites, with an oxygen atom positioned in the oxyanion hole formed by the main-chain NH groups of Gly167 and Ser169, somewhat resembling the new carboxy terminus of a cleaved substrate.