Notice: Wiley Online Library will be unavailable on Saturday 30th July 2016 from 08:00-11:00 BST / 03:00-06:00 EST / 15:00-18:00 SGT for essential maintenance. Apologies for the inconvenience.
Ketoacyl synthases (KSs) catalyze condensing reactions combining acyl-CoA or acyl-acyl carrier protein (acyl-ACP) with malonyl-CoA to form 3-ketoacyl-CoA or with malonyl-ACP to form 3-ketoacyl-ACP. In each case, the resulting acyl chain is two carbon atoms longer than before, and CO2 and either CoA or ACP are formed. KSs also join other activated molecules in the polyketide synthesis cycle. Our classification of KSs by their primary and tertiary structures instead of by their substrates and the reactions that they catalyze enhances insights into this enzyme group. KSs fall into five families separated by their characteristic primary structures, each having members with the same catalytic residues, mechanisms, and tertiary structures. KS1 members, overwhelmingly named 3-ketoacyl-ACP synthase III or its variants, are produced predominantly by bacteria. Members of KS2 are mainly produced by plants, and they are usually long-chain fatty acid elongases/condensing enzymes and 3-ketoacyl-CoA synthases. KS3, a very large family, is composed of bacterial and eukaryotic 3-ketoacyl-ACP synthases I and II, often found in multidomain fatty acid and polyketide synthases. Most of the chalcone synthases, stilbene synthases, and naringenin-chalcone synthases in KS4 are from eukaryota. KS5 members are all from eukaryota, most are produced by animals, and they are mainly fatty acid elongases. All families except KS3 are split into subfamilies whose members have statistically significant differences in their primary structures. KS1 through KS4 appear to be part of the same clan. KS sequences, tertiary structures, and family classifications are available on the continuously updated ThYme (Thioester-active enzYme) database.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
Ketoacyl synthases (KSs) (more officially 3-oxoacyl synthases and also known as β-ketoacyl synthases) are the condensing enzymes that catalyze the reaction of acyl-coenzyme A (acyl-CoA) or acyl-acyl carrier protein (acyl-ACP) with malonyl-CoA, malonyl-ACP, or occasionally other substrates. This reaction is a key step in the fatty acid synthesis cycle, as in general it adds two carbon atoms to growing acyl chains (Fig. 1). KSs exist as individual enzymes, which are essential components of type II fatty acid and polyketide synthesis; in addition, KS domains are found in large multidomain enzymes such as type I fatty acid synthases (FASs) and polyketide synthases (PKSs).1
We have gathered KS amino acid sequences (primary structures) and three-dimensional (tertiary) structures, along with those of other members of the fatty acid synthesis cycle, into the continually updated ThYme (Thioester-active enzYme) database.2, 3 At present, ThYme has 21,028 KS primary and 150 KS tertiary structures. In doing this, we divided each of these enzyme groups into different families based on their primary structural differences. In general, single families contain enzyme members that are related to each other by primary and tertiary structure and mechanism, suggesting that they have a common ancestor. Sometimes members of different families are sufficiently related by primary and tertiary structures and by mechanism that they can be classified as part of a clan, implying that they are descended from a more distant common ancestor. Furthermore, we can divide members of a single family into subfamilies by more subtle primary structural differences.
This article is an account of our division of KSs into families, the gathering of some of the families into clans, and the separation of families into subfamilies. We have done this so that, with the help of known KS crystal structures, mechanisms, and substrate specificities, we could rationally predict the properties of KSs according to the phylogenetic trees that we constructed, and so that we could logically choose KSs to produce and study. Furthermore, we have related properties of KSs with the families in which they are located.
A number of small-scale phylogenetic trees of KSs have already been built. An early tree based on seven tertiary structures showed that ketoacyl-ACP synthase from Synechocystis and Escherichia coli ketoacyl-ACP synthases I and II are very similar, as are Saccharomyces cerevisiae degradative thiolase and Zoogloea ramigera biosynthetic thiolase. More distant from each other and from the other two groups are alfalfa chalcone synthase and E. coli ketoacyl-ACP synthase III.4 A phylogenetic tree of 18 known Arabidopsis ketoacyl-CoA synthases and putative genes has been produced to identify these moieties in putative enzymes.5 A phylogenetic study of 40 β-ketoacyl-ACP synthase III enzymes showed that those produced by bacteria (proteobacteria, firmicutes, and bacteroidetes) and an apicomplexans protist species are widely separated from those produced by monocots, dicots, diatoms, cyanobacteria, and red and green algae.6 Phylogeny of mainly mammalian elongases, but with a few from fungi and other eukaryotes, has been published.7 A detailed tree of protozoal and animal fatty acid elongases as part of a much less detailed tree of elongases from many different phyla has also appeared.8 Lee et al.9 assembled a tree of elongases from protozoal parasites and a few yeasts. Finally, polyunsaturated fatty acid elongases, mainly from marine protists, algae, and diatoms, but including those from a few vertebrates, are found in a phylogenetic tree constructed by Iskandarov et al.10
Basic Local Alignment Search Tool (BLAST)11 and multiple sequence alignment (MSA) were used to classify KSs into different families based on primary structure similarities, while crystal structure superpositions and root-mean-square deviation (RMSD) calculations were used on tertiary structures. More complete descriptions of these methods are found in the Supporting Information of an earlier article.2
The query sequences used for BLAST were KSs with evidence at protein level in the UniProt database,12 ensuring that families are based on sequences with experimental data. Twenty of 187 entries in Enzyme Commission (EC) 2.3.113 are KSs, but four of them have no sequences with evidence at protein level, leaving query sequences to be retrieved from the UniProt database for the remaining 16 EC numbers. Only the KS catalytic domain of each enzyme, obtained from the Pfam database,14 was used. If no Pfam entry appeared, then a hidden Markov model built using HMMER 3.015 was used to find the KS catalytic domain.
BLAST (version 2.2.19) was downloaded and used to populate families with sequences related to the queries in the nonredundant (nr) protein sequence database,16 using an E-value of 0.001. A script automated successive BLAST runs.
MSAs with MUSCLE 3.617 and ClustalX 2.0.1218 using default parameters were conducted on a sample of sequences from each potential family or between different potential families, to determine whether the former should be split or whether the latter should be merged.
The KS catalytic domains of all KS crystal structures in each family, obtained from the RCSB Protein Data Bank,19 were superimposed in MultiProt.20 Then, using MATLAB,21 the RMSDs of the distances between α-carbon atoms of different tertiary structures were calculated.2 Tertiary structures of enzymes in the same family differ in size and number of α-carbon atoms. Therefore Pave values, indicating the average percentage of α-carbon atoms compared, were also recorded. Furthermore, superimposed tertiary structures were visually checked using PyMOL.22
All subfamilies except those in KS3, a very large family (see below), were identified as follows. MSAs of the KS catalytic domains of all sequences in each family were constructed using MUSCLE 3.6. Phylogenetic trees were made using MEGA 4.1 or 5.0.23 They are based on the minimum evolution method24 using complete deletion of sequences, a range of 250–1000 bootstrap iterations, and Jones–Taylor–Thornton (JTT) model values.25 The number of bootstrap iterations was established on a family-by-family basis, the iteration number being reduced in some cases primarily to save computational time while still maintaining an adequate level of rigor in constructing trees of the larger KS subfamilies.
After the phylogenetic tree was complete, subfamilies were chosen based on the visual divergence of one cluster from another, as justified by bootstrap values.24 These manually chosen subfamilies were then subjected to statistical tests to determine each subfamily's z-value26 with respect to another's. The z-value is defined as:
where subscripts i and j denote subfamilies i and j, x denotes the JTT distance, σ denotes the variance, and n denotes the number of data points for each x point. This z-value determines the likelihood that a certain subfamily is part of another (the higher the z-value, the less likely that two subfamilies overlap). The minimal z-value necessary for every subfamily pairing was 3.33, equivalent to a 0.001 probability of two subfamilies being grouped together in a two-tailed test.
To refine each subfamily, an MSA of its sequences was made after adding three to five out-group sequences from the subfamily with the highest z-value. Individual subfamily trees were constructed using MEGA 4.1 or 5.0, with tree construction based on the minimum evolution method using pair-wise deletion of sequences, 1000 bootstrap iterations, and JTT distance matrix values. These parameters were used for all KS subfamilies regardless of the bootstrap value used to construct the initial family tree. The subfamily trees were visually inspected to ensure that the out-group sequences appeared as roots. If this were not so, the subfamily was modified by either removing outlier sequences clustered near the out-group sequences or splitting it into two distinct subfamilies. JTT distance values were computed for the refined subfamilies with respect to all other subfamilies. Once this was complete, z-values were then calculated for these refined subfamilies, and they were compared with the z-values before refinement. The procedure was repeated until the required criteria were met.
KS3 has many more sequences (9585 when the tree was produced and 14,098 at present) than the other four KS families (˜3000 or fewer each). MUSCLE 3.6 created an MSA through seven iterations, short of complete convergence, evidently caused by the high number of sequences. This alignment was passed to FastTree27 rather than to MEGA to create a cladogram.
Ketoacyl Synthase Families and Subfamilies
Based on these techniques, KSs are divided into five families (Table I).
Table I. Ketoacyl Synthase Families and Common Names of Their Members
Elongation of very long-chain fatty acid protein, fatty acid elongase
Nearly all KS1 members are produced by bacteria, with a few formed by eukaryota and only one from an archaeon.3 The dominant enzyme in this family is 3-ketoacyl-ACP synthase III (KAS III), also called 3-oxoacyl-ACP synthase III and β-ketoacyl-ACP synthase III, which is denoted by EC 126.96.36.199 and whose characteristic reaction is malonyl-ACP + acetyl-CoA → acetoacetyl-ACP + CO2 + CoA (Table II). KAS III enzymes are discrete proteins (not covalently linked to other FAS and PKS enzymes) that catalyze the initial condensation reaction in the type II (dissociated) fatty acid elongation cycle and are known as “loading KSs.” However, 388 sequences named this way are instead labeled EC 188.8.131.52, standing for 3-ketoacyl-ACP synthase I, whose characteristic reaction is malonyl-ACP + acyl-ACP → 3-ketoacyl-ACP + CO2 + ACP, with the product 3-ketoacyl-ACP molecule two carbon atoms longer than the reactant acyl-ACP molecule.
Table II. Ketoacyl Synthases Commonly Found in ThYme
β-Ketoacyl-ACP synthase I
Malonyl-ACP + acyl-ACP → 3-ketoacyl-ACP + CO2 + ACP
3 Malonyl-CoA + 4-coumaroyl-CoA → naringenin chalcone + 3 CO2 + 4 CoA
Malonyl-CoA + stearoyl-CoA + 2 NAD(P)H + 2 H+ → icosanoyl-CoA+ 2 NAD(P)+ + CO2 + CoA + H2O
β-Ketoacyl-ACP synthase II
Malonyl-ACP + (Z)-hexadec-11-enoyl-ACP → (Z)-3-oxooctadeca-13-enoyl-ACP + CO2 + ACP
β-Ketoacyl-ACP synthase III
Malonyl-ACP + acetyl-CoA → acetoacetyl-ACP + CO2 + CoA
KS1 is divided into 12 statistically significant subfamilies (Supporting Information Table S1A, with all tables and figures denoted by S found in the Supporting Information) consisting of KSs (overwhelmingly 3-ketoacyl-ACP synthases III) produced only by bacteria, except those in Subfamily 1C, which are produced by cyanobacteria and plants (Table III). Of 2308 aligned KS1 sequences, 128 are outliers. Phylogenetic trees of the 12 subfamilies are found as Supporting Information Figures S1A–S1L.
Table III. Number of Sequences and Phyla of Producing Species within KS1 Subfamilies
Firmicutes, Fusobacteria, Proteobacteria
Bacteroides, Firmicutes, Proteobacteria
Actinobacteria, Firmicutes, Proteobacteria
KS1 subfamilies except subfamily 1C have members produced by one to three bacterial phyla (Table III). When members of a single subfamily are produced by bacteria in more than one phylum, some phyla (actinobacteria and firmicutes) contain Gram-positive bacteria and others (bacteroidetes, fusobacteria, and proteobacteria) contain Gram-negative bacteria.
All KS2 enzymes are from eukaryota, with nearly all from plants. 3-Ketoacyl-CoA synthases, fatty acid elongases, and very long-chain fatty acid condensing enzymes are the most common enzymes in this family. Some are defined as EC 184.108.40.206 (Table II), but the general characterization as EC 2.3.1.– is much more common. Most enzymes in this family catalyze reactions to produce very long-chain fatty acids, whose unbranched chains are longer than 18 carbon atoms.
KS2 can be divided into 10 subfamilies (Table IV, Supporting Information Tables S1B and S2). All but subfamily 2J are composed of enzymes from plants, specifically streptophyta (Table IV); subfamily 2J, on the other hand, has representatives produced by amoeboza and dinoflagellata. All but subfamily 2D have members named as above (although most sequences are undefined); Subfamily 2D has a majority of fiddlehead enzymes.
Table IV. Number of Sequences and Phyla of Producing Species within KS2 Subfamilies
KS3 is the largest KS family, containing at present 14,098 sequences. KSs here include the KS domains of large multidomain enzymes such as iterative type I FASs and modular type I PKSs. Many different enzymes are included in this family, but the largest number of members are 3-ketoacyl-ACP synthases I and II (KAS I and KAS II) and undifferentiated PKSs, with EC 220.127.116.11 and EC 18.104.22.168 as the most common EC numbers (Table II). Bacteria produce most KS3 members; eukaryota are substantial producers, and a few KS3 enzymes are of archaeal origin.
Because of the large number of sequences and the program being used, division of the KS3 family into smaller groups had to be carried out by hand from the FastTree cladogram. After controlled sampling, 14 groups were defined, with groups being composed of sequences produced by single or several phyla and usually being composed of a preponderance of enzymes with similar names (Table V and Supporting Information Table S3).
Table V. Number of Sequences and Phyla of Producing Species within KS3 Groups
Actinobacteria, Firmicutes, Proteobacteria, and other bacterial phyla
Group 3A members are produced overwhelmingly by actinobacteria, with a large majority named as PKSs (Table V). Group 3B members, on the other hand, have a mixture of names and come from mainly cyanobacteria and proteobacteria. Members of the three smaller groups 3C, 3D, and 3E are composed of a mixture of enzymes and are from animals, protozoa, and fungi, respectively. Group 3F sequences are from several bacterial phyla and have been designated with a number of names. In contrast, enzymes in 3G are overwhelmingly named 3-ketoacyl-ACP synthase I and II and are produced by proteobacteria. Group 3H has enzymes with several names from a number of protozoal and animal phyla. The three small groups 3I, 3J, and 3K contain 3-ketoacyl-ACP synthases from fungi, plants, and proteobacteria, respectively, while Group 3L members are named 3-ketoacyl-ACP synthase II and PKS and come from several bacterial phyla. Finally, Groups 3M and 3N have mainly PKSs from fungi and bacteria, respectively.
A large fraction of KS4 enzymes are from eukaryota, while the remaining ones are from bacteria. They are classified as chalcone synthases, stilbene synthases, type III PKSs, and naringenin-chalcone synthases, and overwhelmingly those that have EC numbers are listed as EC 22.214.171.124 (Table II).
There are 10 subfamilies in KS4 (Table VI, Supporting Information Tables S1C and S4). Subfamilies 4A–4C are made up of plant (streptophytal) enzymes, with members of 4D being produced by actinobacteria and phaeophyceae (brown algae), 4E coming from ascomycotal fungi, and 4F–4J being enzymes from various bacterial phyla (Table VI). Subfamily 4A has many more sequences than the other subfamilies. All subfamilies except 4B have a wide variety of synthases; 4B, on the other hand, is composed almost exclusively of chalcone synthases.
Table VI. Number of Sequences and Phyla of Producing Species within KS4 Subfamilies
Many bacterial phyla
Acidobacteria, Actinobacteria, Proteobacteria
Actinobacteria, Firmicutes, Proteobacteria
KS5 members are all from eukaryota, and most are produced by animals. Those that are characterized are almost exclusively fatty acid elongases and elongation of very long-chain (ELOVL) fatty acid proteins. At present, none has an EC number corresponding to an elongase.
KS5 has 11 subfamilies (Table VII, Supporting Information Tables S1D and S5). These subfamilies often have members from several phyla over a wide spectrum (Table VII). Only Subfamily 5B has most of its enzymes with names more specific than the two listed above; that subfamily is populated almost exclusively by fatty acid-CoA elongases. A number of vertebrates produce Subfamily 5C ELOVL1 or ELOVL7 enzymes. Insect and vertebrate ELOVL4 enzymes are found in Subfamily 5F. Subfamily 5G has many vertebrate polyunsaturated fatty acid elongases, some ELOVL5 enzymes, and a few ELOVL2 enzymes. Polysaturated fatty acid elongases from a number of phyla are also found in Subfamily 5H. Some vertebrate ELOVL6 enzymes occur in Subfamily 5K. In mammals, ELOVL1, 3, and 6 catalyze the elongation of saturated and monounsaturated long-chain fatty acids, while ELOVL2, 4, and 5 elongate polyunsaturated long-chain fatty acids.28
Table VII. Number of Sequences and Phyla of Producing Species within KS5 Subfamilies
Chordata, Echinodermata, Platyhelminthes
Several phyla of diatoms, brown algae, green algae, protozoa, and higher plants
Many phyla of protists, diatoms, dinoflagellates, protozoa, brown algae, and lower and higher animals
Correspondence with Earlier Ketoacyl Synthase Phylogenetic Trees
Of the seven enzymes arranged in a phylogenetic tree by Moche et al.,4E. coli ketoacyl-ACP synthases I and II and Synechocystis sp. ketoacyl-ACP synthase II are found in KS3, alfalfa chalcone synthase is in KS4, and E. coli ketoacyl-ACP synthase III is located in KS1. S. cerevisiae degradative thiolase and Z. ramigera biosynthetic thiolase are not KSs, although they also have thiolase-like folds and slight sequence similarity with the KSs in the tree. The relative distances among the different KSs in this work are similar to those found by Moche et al.
The 20 A. thaliana KSs arranged by Blacklock and Jaworski5 all appear to be part of KS2.
All 40 of the 3-ketoacyl-ACP synthase III proteins classified by González-Mellado et al.6 are found in KS1. The enzymes produced by eudicots, monocots, diatoms, and cyanobacteria are all in Subfamily 1C, mostly in the same order as in Supporting Information Figure S1C. The other bacterial enzymes and the one protist enzyme are in various other KS1 subfamilies, as are the four algal proteins.
The mammalian elongases arranged by Leonard et al.7 are found in KS5, Subfamilies 5F, 5G, and 5K, with fungal elongases in Subfamilies 5H and 5J. All protozoal and animal fatty acid elongases in the tree published by Fritzler et al.8 are located in KS5, Subfamily 5K. Also found in KS5, Subfamily 5K, are most of the protozoal parasite elongases arranged by Lee et al.,9 although a few from yeast are in Subfamily 5J. Finally, many protist, algal, and diatom polyunsaturated fatty acid elongases are found in Subfamily 5H, with related vertebrate enzymes in Subfamily 5G.10
Of the four KS families whose members have assigned EC numbers, KS1 and KS3 members use malonyl-ACP as a chain-elongating agent, whereas KS2 and KS4 enzymes use malonyl-CoA (Tables I and II). KS1, KS2, and KS4 members add these to acyl-CoA moieties, while KS3 members add them to acyl-ACP molecules. The fatty acid elongases in KS5, so far without EC numbers, condense malonyl-CoA with acyl-CoA.28
Although KSs of various types have 20 EC entries, only five comprise the great majority of enzymes gathered by using BLAST with query sequences taken from enzymes with evidence at protein level (Table I). These numbers, assigned by groups working on KSs, are EC 126.96.36.199 (3-ketoacyl-ACP synthase I), EC 188.8.131.52 (naringenin-chalcone synthase), EC 184.108.40.206 (icosanoyl-CoA synthase), EC 220.127.116.11 (3-ketoacyl-ACP synthase II), and EC 18.104.22.168 (3-ketoacyl-ACP synthase III).13 The reactions that they characteristically catalyze are shown in Table II. These factors suggest that KS1 and KS3 contain enzymes that catalyze elongating reactions specific to short (fewer than six carbon atoms) to long (12–20 carbon atoms) acyl chain lengths, whereas enzymes in KS2 and KS5 elongate longer (usually >18 carbon atoms) acyl chains, and KS4 enzymes specifically produce chalcones and related molecules. Moreover, KS2 enzymes are produced almost exclusively by plants and KS5 enzymes come mainly from animals.
Ketoacyl Synthase Crystal Structures
All known tertiary structures of members of KS1, KS3, and KS4 have thiolase-like folds (Fig. 2), with five layers of α-β-α-β-α structure.29 KS2 and KS5 presently have no crystal structures.3 KS1 has 38 crystal structures, with an RMSDave values obtained by superposition of these structures of 1.22 Å and a Pave value of 82.7%. The corresponding values for KS3 are 71 structures, 1.42 Å, and 67.5%, whereas those for KS4 are 41 structures, 1.18 Å, and 93.2%.
Crystal structures from KS1, KS3, and KS4, one from each family, were superimposed (Fig. 2). The RMSD of the superimposed structures is 1.96 Å, with a Pave of 68.5%.
Ketoacyl Synthase Catalytic Residues and Mechanisms
Based on crystal structures and consistent with previous results with thioesterases,2 catalytic residues are well conserved within KS1, KS3, and KS4 (Table VIII). This leads us to assume that all members of a family have the same ping-pong kinetic mechanism,30 using cysteine, histidine, and either histidine or asparagine as a catalytic triad.
Table VIII. Catalytic Residues of Ketoacyl Synthase Families
Cysteine, histidine, and asparagine form the catalytic traid in KS1. Qiu et al.31 proposed that Cys112 in E. coli β-ketoacyl-ACP synthase III (PDB entry 1HN9) donates a proton to His244 and attacks acetyl-CoA. Then, malonyl-ACP is attached to His244 and Asn274 to be decarboxylated, forming a carbanion. Finally, the carbanion attacks the acetyl moiety to form acetoacetyl-ACP.
In KS2, mutagenetic analysis of Arabidopsis FAE1 β-ketoacyl-CoA synthase strongly suggested that it shares the same ping-pong mechanism and putative Cys-His-Asn/His catalytic residues as members of KS1, KS3, and KS4, but in this case joining malonyl-CoA with a long-chain acyl-CoA.32 Although no crystal structure is yet available to provide confirmation, it appears that the catalytic residues are Cys223, His391, and Asn424, congruent with the identity and spacing of the catalytic residues in the other three families (Table VIII). No NADPH was necessary for wild-type enzyme activity, indicating that EC 22.214.171.124 is an incorrect designation for this enzyme family (Tables I and II).
The KS3 active site has a Cys-His-His triad (Table VIII). In Streptococcus pneumoniae KASII (2ALM), acyl-ACP transfers its acyl moiety to Cys164. It is proposed that a water molecule activated by His303 then attacks malonyl-ACP to form a carbanion. His337 also stabilizes the malonyl moiety. Last, the carbanion attacks the acyl moiety and forms β-ketoacyl-ACP.33
KS4 members have a Cys-His-Asn catalytic triad, the same as KS1 members (Table VIII). The chalcone synthase/stilbene synthase superfamily catalyzes the same acyl transfer, decarboxylation, and condensation steps as KS1, plus further cyclization and aromatization reactions before it forms the final chalcone product.34
Little is yet known about the catalytic mechanism of KS5 enzymes. It appears that no catalytic amino acid residues have yet been identified. MSAs have identified conserved histidine and asparagine residues, the former however in a membrane-spanning region,7 but no conserved cysteine residues (Supporting Information Fig. S2).
Ketoacyl Synthase Clans
Although amino acid sequences of members of different families may completely or almost completely differ from each other, if their crystal structures, catalytic residues, and mechanisms are conserved, they could be part of the same clan, maybe having a distant common ancestor.
KS1, KS2, and KS4 have some members whose sequences are similar. In addition, the tertiary structures of members of KS1, KS3, and KS4 may be superimposed (Fig. 2), with their presumed catalytic residues in the same positions (Fig. 3). Furthermore, their secondary structure elements are found in the same order, although with some gaps (Table IX).
Table IX. Secondary Structure Elements of Ketoacyl Synthase Families
KS1, KS2, KS3, and KS4 members have similar catalytic triads, indicating that they catalyze essentially the same basic reaction by the same or a similar mechanism. Thus, these four families fall into one clan. There is no indication at present that KS5 enzymes are part of this clan; more specifically, the fatty acid elongases in KS2, nearly all from higher plants, and those in KS5, mainly from animals and almost none from plants, appear not to be related.
The over 20,000 primary structures of the KSs have been sorted into five families, separated by different amino acid sequences and by the characteristic reactions that they catalyze. Four of the families (KS1–KS4) appear to be part of one clan because of their slight similarities of primary structure and strong similarities of secondary structure element orders, tertiary structures, placement and identity of catalytic residues, and implied mechanisms. Four of the families (KS1, KS2, KS4, and KS5) have been further split into 10–12 subfamilies each by their statistically significant differences in primary structure. Sequences of the fifth family (KS3) have been separated manually into 14 groups based on the organisms that produce them and sometimes by the reactions that they catalyze. This information should be useful to researchers in choosing specific KSs to study further.
The National Science Foundation Engineering Research Center for Biorenewable Chemicals is headquartered at Iowa State University and includes Rice University, the University of California, Irvine, the University of New Mexico, the University of Virginia, and the University of Wisconsin-Madison. R.P.M. from the University of Michigan was part of a Research Experiences for Undergraduates program associated with this Engineering Research Center. The authors thank Professor Derrick Rollins (Iowa State University) for providing the z-value equation. They also thank Professor Basil Nikolau and the members of his research group for helpful advice.