Genome-level and biochemical diversity of the acyl-activating enzyme superfamily in plants


  • Jay Shockey,

    Corresponding author
    1. USDA-ARS, Southern Regional Research Center, Commodity Utilization Research Unit, New Orleans, LA 70124, USA
      (fax +1 504 286 4367; e-mail
    Search for more papers by this author
  • John Browse

    1. Institute of Biological Chemistry, Washington State University, Pullman, WA 99164 6340, USA
    Search for more papers by this author

(fax +1 504 286 4367; e-mail


In higher plants, the superfamily of carboxyl-CoA ligases and related proteins, collectively called acyl activating enzymes (AAEs), has evolved to provide enzymes for many pathways of primary and secondary metabolism and for the conjugation of hormones to amino acids. Across the superfamily there is only limited sequence similarity, but a series of highly conserved motifs, including the AMP-binding domain, make it easy to identify members. These conserved motifs are best understood in terms of the unique domain-rotation architecture that allows AAE enzymes to catalyze the two distinct steps of the CoA ligase reaction. Arabidopsis AAE sequences were used to identify the AAE gene families in the sequenced genomes of green algae, mosses, and trees; the size of the respective families increased with increasing degree of organismal cellular complexity, size, and generation time. Large-scale genome duplications and small-scale tandem gene duplications have contributed to AAE gene family complexity to differing extents in each of the multicellular species analyzed. Gene duplication and evolution of novel functions in Arabidopsis appears to have occurred rapidly, because acquisition of new substrate specificity is relatively easy in this class of proteins. Convergent evolution has also occurred between members of distantly related clades. These features of the AAE superfamily make it difficult to use homology searches and other genomics tools to predict enzyme function.


In plants, fatty acid and lipid biosynthetic pathways, including those for fatty acid synthesis, the tricarboxylic acid cycle, isoprenoids, sterols, cutin, suberin, phospholipids, and triacylglycerol (TAG) all rely on coenzyme A. Many pathways of secondary product synthesis also use CoA-linked intermediates, including those leading to lignins, lignans, flavonoids and glucosinolates. Some catabolic pathways, such as fatty acid β-oxidation, also require CoA. The ubiquity of CoA in metabolism is not restricted to plants; in all living organisms it is an essential component of many primary and secondary biochemical pathways that supply core building blocks and energy for the cell. Begley et al. (2001) estimated that CoA may be involved in approximately 4% of all enzymatic reactions, either as a cofactor or linked to particular substrates. Most metabolic intermediates contain a high-energy bond between the pantetheine moiety of CoA and a carboxyl group. Multiple routes exist for the activation of carboxylates with CoA, such as use of metabolic free energy by pyruvate dehydrogenase to form acetyl-CoA, or ATP and GTP synthesis by substrate-level phosphorylation by the reverse reaction of certain classes of acetyl-CoA synthetases (Sánchez et al., 2000) and succinyl-CoA synthetases (Ottaway et al., 1981). However, many carboxylate compounds are activated by CoA ligase enzymes (E.C. 6.2.1). Most of the enzymes in this class carry out a two-step mechanism. First, the carboxylic acid substrate is activated by ATP pyrophosphorylysis, forming an enzyme-bound acyl-AMP intermediate called an adenylate. In the second step, the free electrons in the thiol group of CoA (or other acyl acceptor) displace the AMP by nucleophilic attack, forming the relevant thioester and releasing AMP. While this reaction is hypothetically reversible, in vivo the reverse reaction is eliminated by hydrolysis of PPi released during the first step of the reaction:


The plant CoA-ligases are part of the larger superfamily of adenylate-forming enzymes that include luciferases and non-ribosomal peptide synthetases. Crystal structures are available for more than 15 proteins of this enlarged family from microbes and animals (Gulick, 2009). The first plant structure, for 4-coumaroyl-CoA ligase, has recently been added to this list (Hu et al., 2010). The available structures have helped to confirm unique aspects of the catalytic mechanism that are required for the two-step reaction (Gulick, 2009). The protein structures consist of a large N-terminal domain and a small C-terminal domain with a link that contains a lysine ‘hinge’ residue. In one conformation (adenylate forming), ATP and the carboxyl substrate are able to bind and react. Once the carboxyl-AMP intermediate is formed, PPi release drives a approximately 140° rotation of the C-terminal domain to the thioester-forming conformation. This exposes a tunnel formed between the two domains that binds the pantetheine moiety of CoA and facilitates catalysis of the second reaction step. Following release of the carboxyl-CoA and AMP products, the C-terminal domain rotates back to the substrate-binding conformation. The ATP and CoA binding sites are both quite well defined and conserved. As might be expected, the binding site for the carboxyl substrate, which is in the N-terminal domain, is highly variable. For the five structural characterized acyl-CoA synthetases only 14 out of 250 residues in the core N-terminal domain were found to be conserved (Gulick, 2009). For this reason, it has been hard to define the residues involved in substrate binding, or to predict, even in a general fashion, the substrate specificity of individual enzymes.

An area of long-standing interest in our laboratories has been the study of the genetic evolution and biochemical diversification of a large superfamily of CoA ligase-related genes called acyl-activating enzymes (AAEs) (Shockey et al., 2003). One of the unifying features of this superfamily is the well-conserved 12 amino acid residue AMP-binding motif (PROSITE PS00455). We have previously used this feature to identify 44 genes in Arabidopsis that encode long- and short-chain acyl-CoA synthetases (Shockey et al., 2003), and related genes (Koo et al., 2005). The superfamily also contains an additional 19 Arabidopsis genes encoding enzymes that catalyze conjugation of plant hormones with amino acids (Staswick et al., 2002, 2005; Nobuta et al., 2007; Okrent et al., 2009). These enzymes do not involve formation of a thioester with CoA, but rather, use the adenylated substrate to form an amide linkage. The AMP-binding site in this subgroup of proteins contains an extra residue near the middle of the motif (Shockey et al., 2003). The use of this motif is highly predictive, since candidate proteins can be identified with a high degree of confidence, despite limited sequence homology overall. Other conserved domains and structural features were also used to separate the AAE superfamily into seven clades (Figure 1). These groupings were intended only as a preliminary attempt to cluster proteins of like function (Shockey et al., 2003), but three clades each contain proteins with closely related functions. Clade I contains long-chain acyl-CoA synthetase (LACS) proteins, along with two enzymes that ligate fatty acids to acyl-carrier protein (Koo et al., 2005), clade III contains amino-acid conjugases (Staswick et al., 2002, 2005; Nobuta et al., 2007; Okrent et al., 2009), while clade IV enzymes are 4-coumaroyl-CoA ligases (with the exception of At1g62940) (de Azevedo Souza et al., 2008; Hamberger and Hahlbrock, 2004; Hamberger et al., 2007). While the acronym AAE was suggested as a generally descriptive gene family name, many other nomenclatures have been used in specific cases (de Azevedo Souza et al., 2008, 2009; Koo et al., 2006).

Figure 1.

 Phylogenetic analysis of the Arabidopsis AAEs.
AAE protein sequences were aligned using ClustalX (version 1.8.1, Thompson et al., 1997), and alignments displayed graphically as an unrooted tree, using treeview (version 1.6.6,, Page, 1996) as described in Shockey et al. (2003). The locus identifiers for each gene are followed by the assigned names, where appropriate. The clade designations are shown in roman numerals to the right, as established in Shockey et al. (2003). The branch lengths are proportional to the degree of divergence, with the scale of ‘0.1’ representing 10% change. Unknown AAE protein not previously described is boxed. See text for details concerning cloning, naming, and characterization of individual genes.

The past several years have seen great progress in the elucidation of the biochemical and physiological roles of representative enzymes from each of the seven clades. Many recent discoveries have refined our understanding of the general roles of previously characterized AAE proteins. Equally important though, enzymes have been identified that filled in long-standing gaps in our knowledge of well-characterized biochemical pathways, provided access to known pathways that were previously intractable to traditional genetic or biochemical approaches, and in some cases, even introduced the scientific community to reactions and pathways not previously known.

The previous studies made clear that Arabidopsis had evolved a much larger and more complex AAE gene superfamily than any of several other organisms, spanning different parts of the tree of life, whose complete genome sequences were available for analysis at that time (Shockey et al., 2003). Notably, this list of organisms did not include other plant species. The diversity of AAE gene content likely arose due to large-scale and smaller-scale changes in the size and structure of genomes over evolutionary time, resulting in duplications of many genes in all gene families, including the AAEs. Such enrichment in gene number provides an organism freedom to evolve new enzyme variants to meet new metabolic requirements and environmental challenges.

The expanding scope of studies implicating CoA ligase-related enzymes in an increasing number of interesting biological processes prompted us to revisit this area. The following review will address a number of issues, including determination of the AAE gene family size in other representative plant species, an examination of the predicted evolutionary mechanisms that led to such a large family of AAE genes in plants, and the state of knowledge regarding the various pathways in which these groups of enzymes participate. Specific case studies for representative members of several clades of the AAE superfamily will be presented. An updated phylogenetic tree showing the relationships between the different members of the AAE superfamily, including a newly-added protein of unknown function (discussed below) is shown in Figure 1.

Evolution of the Arabidopsis AAE superfamily was driven by multiple large- and small-scale gene duplication events

The advantages of using Arabidopsis as a model system are numerous, and have been well-chronicled over the years. Not the least of which is its small genome, which contains remarkably little repetitive DNA. As a consequence, a full complement of genes is packaged into only approximately 140 Mb, a fraction of the size of much larger plant genomes, such as the estimated 16 Gb genome of wheat. Despite its small size, the overall structure and gene content of the five chromosomes in Arabidopsis is far less simple. A large percentage of the DNA content found in Arabidopsis (and many other eukaryotes), arose through small-scale, large-scale, and even whole-genome duplication events that occurred over evolutionary time. Most biologists support the view that evolution is influenced by the process of gene duplication followed by occasional retention and subsequent divergence to new function, balanced against the more common fate of loss of newly duplicated genes (Thomas et al., 2006). Such gene duplication events have played influential roles in the evolution of many types of organisms and likely helped drive diversification of the tree of life.

Large-scale duplication events

Gene duplications arise through two main types of events: large-scale duplications and formation of tandem arrays. Tandem arrays occur by duplications of small regions of a given chromosome, followed by integration nearby on the same chromosome. Tandem arrays have been observed in large numbers in the genomes of both C. elegans and Drosophila melanogaster (The C. elegans Sequencing Consortium, 1998; Adams et al., 2000). Large-scale duplications fall into two general subcategories. One is the duplication of all or a large portion of one chromosome or some number less than a full chromosomal complement, leading to aneuploidy, as likely happened approximately 70 million years ago (Mya) during the evolution of rice and related cereals (Vandepoele et al., 2003). Full-genome duplication of a diploid chromosome set (a process often called ‘tetraploidization’ or ‘hexaploidization’, depending on the number of duplication events), followed by selective gene loss (‘diploidization’) seems to have occurred often in angiosperms (Otto and Whitton, 2000; Wendel, 2000; Bowers et al., 2003; Thomas et al., 2006; The French–Italian Public Consortium for Grapevine Genome Characterization., 2007; Schmutz et al., 2010) yeasts (Scannell et al., 2006), and vertebrates. One or two genome duplications occurred in an ancestral chordate during the dawn of the vertebrates, and one additional duplication took place in fish, after their split with land vertebrates, events that collectively spanned approximately 600 million years (Blomme et al., 2006; McLysaght et al., 2002).

Both types of genome rearrangements have occurred in Arabidopsis. Early analysis of the Arabidopsis genome showed that 4100 of its approximately 26 000 genes (approximately 17%) are found in more than 1500 tandem arrays (The Arabidopsis Genome Initiative, 2000). Looking at the larger-scale chromosomal or genomic level changes, a consensus has not been reached as to the specifics of how the Arabidopsis genome arrived at its current state. Much debate remains regarding which methodology is best to assess the true numbers and timing of the one to four different partial or whole-genome duplications that have occurred in Arabidopsis (The Arabidopsis Genome Initiative, 2000; Beilstein et al., 2010; Blanc et al., 2003; Bowers et al., 2003; Thomas et al., 2006; Simillion et al., 2002; Vision et al., 2000). In any case, a large amount of the unique gene content of Arabidopsis is found within a set of syntenous blocks of DNA created during the most recent polyploidization event, called the α event (The Arabidopsis Genome Initiative, 2000; Bowers et al., 2003; Thomas et al., 2006), which likely occurred during the emergence of the crucifer family about 24–86 Mya (Blanc et al., 2003; Bowers et al., 2003).

We searched for syntenous regions within the genome of Arabidopsis to assess the degree to which the AAE superfamily has diversified via partial or genome-wide polyploidization events. We used the web-based search method created by Blanc et al. (2003), found at, to search for related genes on syntenous blocks of DNA. Tandem duplicated genes and transposable elements were removed from the data set before searching for duplicated regions in the genome to reduce the risk of generating spurious matches (Blanc et al., 2003). In this case, the reduction of AAE tandem repeats (see below) to a single representative gene reduces the 64 member AAE superfamily to 49 members. Duplications are defined here as those portions of chromosomal DNA that fall in ‘sister regions’, syntenous blocks of DNA containing six or more shared, similar genes that are retained in the same order. Using these criteria, the authors found 108 pairs of blocks that covered approximately 23% of the proteome (McLysaght et al., 2002; Blanc et al., 2003). In the case of the duplicate-reduced AAE superfamily, 13 genes fall in syntenous blocks and can be confidently predicted to have arisen from large-scale chromosomal duplications. The pairings of syntenic genes, their block identifications, and other information can be found in Table 1. The first duplication probably occurred during a more ancient duplication event (referred to as the β event, approximately 170–235 Mya, Bowers et al., 2003). The remaining 12 genes came about via duplications of six different blocks during the α event (The Arabidopsis Genome Initiative, 2000; Blanc et al., 2003; Bowers et al., 2003; Thomas et al., 2006). The first pairing is LACS4 and LACS5, which lie within blocks of 0.7 and 0.85 Mb of syntenic DNA, respectively, on different parts of chromosome IV. These blocks share 40 similar genes and are a typical example of all of the pairings of syntenous genes arising from the α event. A graphical representation of the sister regions containing LACS4 and LACS5 is shown in Figure 2, with only the conserved similar genes shown. The lone example of a well-conserved ancient duplication (i.e. a duplication that occurred prior to the α event) occurred between LACS3 and LACS5. LACS3, LACS4, and LACS5 form a closely related branch of the LACS phylogenetic tree (Shockey et al., 2002), and determination of the lineage of these three genes is not clear. Comparisons of the respective syntenic DNA blocks, including total DNA length, conserved gene number, and percent conserved gene number (data not shown), between the three suggests that LACS5 is the ancient parent, which duplicated to form LACS3, long before the recent duplication of LACS5 to form LACS4. However, the lack of genes orthologous to LACS3 in either Physcomitrella patens or Populus trichocharpa (see below), suggests that the patterns and timeline of gene duplication, loss, and diversification for this branch are complex and that LACS4 may pre-date LACS3.

Table 1.   Syntentic AAE genes, syntenic block identifications, and block scores (in number of shared similar genes)
Gene pairSyntenous block ID and score (in gene number)Syntenous block age class
AtLACS5, AtLACS30104422101540 (19)Old
AtLACS5, AtLACS40404077901550 (40)Recent
AtLACS6, AtLACS70305033201380 (51)Recent
AtAAE15, AtAAE160304185103260 (63)Recent
GH3.6, GH3.50405189803620 (134)Recent
GH3.2, GH3.30204141701130 (31)Recent
At4CL1, At4CL40103319703610 (99)Recent
Figure 2.

 Graphical illustration of shared gene content in syntenous DNA blocks containing AtLACS4 (At4g23850, shown in blue text on lower right) and AtLACS5 (At4g11030, shown in red text on upper left).
Also shown are the remaining 38 shared, similar genes maintained between the two blocks. Synteny was detected using the locus identifiers of individual genes as queries to search the database at

LACS6 and LACS7 also arose during the α event, likely with the block containing LACS7 acting as the progenitor of the LACS6 block. AAE15 and AAE16 are also syntenous, as are two of the four 4CL genes. Our results suggested synteny between 4CL1 and 4CL4, which in turn would have been duplicated again to form 4CL2 (see below), whereas Hamberger and Hahlbrock (2004) suggest that 4CL1 and 4CL2 are syntenous and ancestral to 4CL4. Finally, four of the clade III hormone-amino acid conjugases reside in two sets of sister regions. The blocks containing At2g23170 (aka GH3.3) and At4g37390 (aka GH3.2) have one of the lowest block size scores (31) in the set described here, whereas the pair containing At4g27260 (GH3.5) and At5g54510 (GH3.6) (Staswick et al., 2005) has the highest (134), suggesting that the gene content of the latter pair of blocks has undergone relatively less ‘diploidization’. The hormone conjugases of Clade III make up the largest functional class within the AAE superfamily (19 of 64 genes), and much of the size and complexity of this group is explained by the multiple gene duplication events that have taken place in Arabidopsis. In total, the existence of at least 13 of the reduced set of 49 AAE genes (approximately 26%) can be explained via gene duplications during the α and β polyploidization events, a number that agrees very well with the determination that 23% of the entire proteome is represented in this set of blocks (Blanc et al., 2003). Other AAE gene duplications within blocks of smaller size may be present, as syntenous blocks of <6 genes are also present in much larger numbers than would be expected by random chance (Blanc et al., 2003), but are more difficult to determine with a high degree of confidence.

Small-scale duplication events

We also investigated the evolutionary history of the AAE superfamily with respect to smaller-scale events leading to tandemly duplicated genes. Shockey et al. (2003) noted several clusters of similarly annotated AAE genes, and suggested that many of these genes arose via local duplications. The grouping of AAE genes in tandem arrays were found primarily by simple visual examination of gene distribution along the chromosomal maps at the Salk Institute Genomic Analysis Laboratory website (SIGnAL,, using the gene annotations as a guide. Twenty-one of the 64 genes (approximately 33%) are present as putative duplicates, and were grouped into seven tandem arrays. Information regarding the distribution of the arrays along the Arabidopsis chromosomes can be found in Table 2. Chromosomes II and IV do not contain AAE gene arrays, while chromosome III contains only one array with two members of the At4CL gene family. Most of the tandem arrays mapped to chromosomes I and V, which is not surprising given that these two chromosomes contain the majority of AAE genes overall (27 and 14, respectively).

Table 2.   Arabidopsis AAE gene tandem arrays, chromosomal location, gene content, and genomic footprint
Tandem arrayLocationGenesGenomic footprint (kb)
1Chromosome I, upper armAt1g20480, At1g20490, At1g20500, OPCL1, AAE126.4
2Chromosome I, upper armAAE10, AAE95.4
3Chromosome I, lower armAt1g48660, At1g486705.2
4Chromosome I, lower armAAE11, AAE12, BZO1106.0
5Chromosome III, upper arm4CL2, 4CL49.1
6Chromosome V, upper armAt5g13320, At5g13350, At5g13360, At5g13370, At5g1338023.5
7Chromosome V, upper armAAE5, AAE69.2

Clearly, a large degree of the complexity of this superfamily of genes can be attributed to frequent, relatively recent small-scale gene duplication events. It remains to be seen which of the duplicated genes are functionally redundant to their forebears, and which have evolved new functions, or have acquired new temporal or spatial gene expression characteristics or other traits that will contribute to their evolutionary survival. As discussed below, there are examples where duplicates have retained substantially the same function, or alternatively have diverged in terms of expression patterns, subcellular location, or other characteristics.

AAE gene family size and organismal complexity have increased in parallel

As land plants evolved from their algal ancestors, they were faced with enormous challenges associated with adaptation to a variably dry, sessile lifestyle. The transition from unicellular or relatively simple multicellular forms to highly complex life forms required production and proper patterning of organs and tissues and development of intercellular communication networks. Such networks required development of multiple interwoven pathways for chemical and hormonal communication and interaction between cell and tissue types within the organism as well as with pollinators (in the case of flowering plants), pathogens, herbivores, and other species in the environment. Exposure to nonaquatic environments also required production of intracellular and extracellular deterrents and barriers to protect against water and nutrient loss, and UV light and temperature extremes. Many of the known or suspected enzymatic products of the AAE superfamily are necessary components of the pathways that developed to fill these roles. Therefore, we sought to learn more about the numbers and complexity of the AAE gene families in three relatively distant plant species whose genomes have been sequenced since 2003, Chlamydomonas reinhardtii (a unicellular green algae; Merchant et al., 2007), Physcomitrella patens (a moss; Rensing et al., 2008), and Populus trichocharpa (black cottonwood tree; Tuskan et al., 2006).

The protein sequences of multiple Arabidopsis AAEs representing several clades of the superfamily, including LACS3, AAE8, AAE18, 4CL1, and JAR1, were used to search the proteomes of C. reinhardtii, P. patens, and P. trichocharpa using the BLAST function available at the Phytozome version 6.0 site (, which is maintained by the United States Department of Energy Joint Genome Institute and the University of California Berkeley Center for Integrative Genomics. Individual candidate genes from each list were reanalyzed by BLAST comparison to the Arabidopsis proteome (TAIR10, to find their closest match. A summary of the numbers of AAE proteins, and their segregation into each of the seven clades described in Shockey et al. (2003) is shown in Table 3. Unrooted phylogenetic trees displaying the relationships of the AAE genes in P. patens and P. trichocharpa are shown in Figures 3 and 4, respectively.

Table 3.   Estimated total number and clade-specific distribution of AAE genes in sequenced genomes of green algae, moss, Arabidopsis and black cottonwood tree
OrganismTotal no. AAE genesClade IClade IIClade IIIClade IVClade VClade VIClade VIIOther*
  1. *See text for additional details.

Chlamydomonas reinhardtii1455000121
Physcomitrella patens37136264240
Arabidopsis thaliana6411319581431
Populus trichocharpa77124147132034
Figure 3.

 Phylogenetic comparison of the AAE gene family of Physcomitrella patens.
Unrooted trees were generated as described in the legend for Figure 1. Clades are marked as designated for Arabidopsis (Shockey et al., 2003). Listings of types and numbers of best Arabidopsis hits found within each clade are also shown. The branch lengths are proportional to the degree of divergence, with the scale of ‘0.1’ representing 10% change.

Figure 4.

 Phylogenetic comparison of the AAE gene family in black cottonwood tree (Populus trichocharpa).
Unrooted trees were generated as described in the legend for Figure 1. Clades are marked as designated for Arabidopsis (Shockey et al., 2003). Listings of types and numbers of best Arabidopsis hits found within each clade are also shown. Proteins without close orthlogs in Arabidopsis are boxed. The branch lengths are proportional to the degree of divergence, with the scale of ‘0.1’ representing 10% change.

Clearly, the AAE gene family expanded and diversified as land plants evolved from their smaller, simpler forebears. The degree of overall gene family expansion seems to have varied widely in each lineage, as has the specific types of genes which have undergone duplication or loss. The genome of C. reinhardtii is comparable in size with that of Arabidopsis (Bowman et al., 2007; Merchant et al., 2007), yet contains only 14 AAE genes. A single copy of a gene orthologous to LACS7 is present, likely for activation of fatty acids for β-oxidation. Even though it is photosynthetic, it does not contain a gene related to AtLACS9, the primary plastidial LACS isoform thought to play a role in activation of de novo synthesized fatty acids in higher plants (Schnurr et al., 2002). Instead, it contains three proteins most closely related to either LACS4 or LACS5. While LACS4 is targeted to ER membranes in Arabidopsis (D. Jessen, J. Knüfer, M. Hoppert, A. Polle, A. Olbrich and M. Fulda, unpublished data), the targeting of LACS5 has not yet been determined. One or more of these ancient LACS4 or LACS5 relatives may then have been recruited for plastidial fatty acid activation in this organism. It also does not contain genes in clades III, IV, and V, whose members catalyze activation of hormones (using amino acids), hydroxycinnamic acids, and jasmonic acid precursors (and other fatty acids), respectively. These findings are not surprising, given that the biochemical pathways in which clade III–clade V enzymes participate in land plant-specific processes (de Azevedo Souza et al., 2008). Chlamydomonas reinhardtii contains one gene with weak homology to BZO1 (Kliebenstein et al., 2007); this gene is the only representative of clade VI, suggesting that this clade is not just plant-specific as previously suggested (Shockey et al., 2003), but specific to multicellular land plants. The most dramatic expansion of gene copy number in this algae involved ACS, the acetyl-CoA synthetase; four proteins containing >57% amino acid identity to single-copy Arabidopsis ACS are present in C. reinhardtii. This organism also contains a large (>3290 amino acid residue) type I polyketide synthase gene. Polyketide synthase (PKS) enzymes are large multi-domain complexes that resemble the fatty acid synthases, and usually contain an AMP-binding motif. While type I PKS is typically only found in bacteria and fungi, they have recently been discovered in a number of chlorophyte protist species, including C. reinhardtii (John et al., 2008).

The bryophyte Physcomitrella patens contains at least 37 AAE genes with representatives in each of the seven defined clades (Figure 3), within a genome of nearly 0.5 Gb. This species has only two genes in clade VI. It also contains only two genes related to clade III hormone-amino acid conjugases. A rudimentary system for auxin signaling is likely present in P. patens (Paponov et al., 2009); the two Clade III genes probably provide the necessary amino acid conjugation capacity required in this organism. The presence of only two genes from each of these two clades suggests that higher order complexity of these groups of proteins has mirrored organismal complexity quite closely. Physcomitrella patens contains approximately the same number of true 4CL and 4CL-like genes (clades IV and V; Figure 3) as seen in Arabidopsis, including a group of four 4CL-like genes that appear to be bryophyte-specific, as discussed in detail in de Azevedo Souza et al. (2008). This species contains three genes closely related to LACS9. Subcellular targeting studies would be necessary to determine if all three isozymes are targeted to plastids, but it is tempting to speculate that evolution of LACS9 genes, and recruitment of LACS9 proteins to plastids, could be one of the hallmarks separating algae from land-based plant lineages. One LACS6 and two LACS7 homologues are also present in P. patens. Similar to the Arabidopsis orthologues, the P. patens LACS6 protein contains a PTS2 peroxisomal targeting signal. One of the two putative LACS7 proteins has maintained its C-terminal PTS1 peroxisomal targeting signal while the other putative LACS7 isozyme has lost the PTS1 but maintained a near-perfect PTS2 near its N-terminus (data not shown), as observed for Arabidopsis LACS7 (Fulda et al., 2002). Physcomitrella patens also contains four genes whose highest Arabidopsis BLAST scores are to LACS4, suggesting that LACS4, rather than LACS3, may be the ancestral parent to the LACS3/4/5 branch in clade I (see Table 1 and discussion above). Few of the AAE genes in P. patens appear to have arisen through tandem duplications; these data are consistent with the finding that only approximately 1% of the protein coding genes in this organism have arisen through this type of event (Rensing et al., 2008). Rather, most of the metabolic genes in P. patens arose through whole genome duplication, followed by substantial retention of the duplicated genes (Rensing et al., 2007).

The black cottonwood tree (Populus trichocharpa) has a genome size of approximately 480 Mb (Tuskan et al., 2006), and contains at least 77 AAE genes (Figure 4). All seven major clades are represented in P. trichocarpa, with the majority residing in clades I, III, V, and VI (12, 14, 13, and 20 genes, respectively, see Table 3). Clearly, the biochemistry and subcellular localization of pathways associated with regulation of hormone levels via amino acid conjugation (Staswick et al., 2005) and peroxisomal fatty acid metabolism (Koo et al., 2006; Kliebenstein et al., 2007; Kienow et al., 2008) became much more complex during the period of time separating the evolution of bryophytes and angiosperms, as evidenced by the large number of clade III and clade VI enzymes in both Arabidopsis and black cottonwood. Several LACS genes also seem to have evolved during this period of time; LACS1, LACS2, and LACS8 proteins are present in Arabidopsis and black cottonwood, but not found in the moss P. patens. Both types of gene duplications have occurred in P. trichocharpa. At least two whole-genome duplications have contributed at least 8000 pairs of genes that have survived, and tandem gene duplications have also accounted for approximately 11% of the extant gene content in this species (Tuskan et al., 2006). Based on the annotation of the P. trichocharpa proteome contained in the Phytozome database, it appears that 16 of the 77 genes in this species arose via tandem gene duplications. Two of the tandemly duplicated genes are included in the set of four genes that do not fall into any of the previously designated seven clades (shown boxed in Figure 4). One is related to At5g35930, a Arabidopsis gene with a diverged AMP-binding motif not characterized previously (see also Figure 1). At5g35930 and its P. trichocharpa ortholog are related to aminoadipate semialdehyde dehydrogenases, a class of lysine metabolic enzymes not yet described in higher plants (Torruella et al., 2009). The other three genes do not have orthologues in Arabidopsis, but are conserved in various other plants including rice, sorghum, and castor bean. The precise biochemical functions and physiological roles of these four proteins are unknown, but clearly provide new opportunities to explore potentially novel aspects of plant metabolism.

Connecting the evolution and biochemistry of AAE enzymes

Plastidial fatty acid biosynthesis and activation

Free fatty acids are relatively chemically inert, and as such, are rarely used as substrates. As mentioned above, activation of free fatty acids to their coenzyme A (CoA) derivatives is essential prior to use as substrates in the diverse biochemical pathways in which they are needed. Long chain fatty acyl CoAs are primarily made by the enzymes found in Clade I of the AAE superfamily, most of which are long-chain acyl-CoA synthetases (LACS) (Shockey et al., 2002). Over the past several years, many laboratories have investigated the roles of specific LACS genes and proteins from a variety of direct and indirect angles.

Fatty acids are the building blocks for a wide variety of phospholipids, triacylglycerols, sterols, waxes, cutin, suberin, acyl sugars, and other cellular components. Essential to the biosynthetic pathways for any of these compounds however is the initial production of the fatty acids themselves. Fatty acid biosynthesis in plants is primarily a plastid-localized pathway, with only a few specialized types of fatty acids produced in mitochondria (Yasuno and Wada, 2002). The plastid localization of fatty acid synthesis means that plants also must have evolved mechanisms to control export of fatty acids from plastids. Plastid envelope-associated LACS activity was demonstrated more than 30 years ago (Roughan and Slack, 1977), and later, specifically associated with the outer envelope (Andrews and Keegstra, 1983; Block et al., 1983). Kinetic labeling experiments demonstrated unequivocally that LACS is a necessary component for export of activated fatty acids. Free fatty acids are likely maintained as a discrete pool that is channeled directly to LACS enzyme in the outer envelope, possibly by means of transporters, lipid-binding proteins or other factors (Koo et al., 2004). The combined action of LACS and a transporter protein might resemble the LACS-dependent process of fatty acid transport known as ‘vectorial transport’ found in E. coli (Black et al., 1992; DiRusso and Black, 1999) and baker’s yeast (Johnson et al., 1994).

Among the nine Arabidopsis LACS genes, gene expression analysis suggested LACS9 as a good candidate for a plastidial LACS involved in de novo fatty acid synthesis (Schnurr et al., 2002). Subcellular targeting studies demonstrated that LACS9 associates strongly with chloroplast membranes. In addition, plastidial targeting of LACS9 did not involve proteolytic processing (Schnurr et al., 2002), consistent with findings for other outer envelope-associated proteins (Tranel et al., 1995). The role of LACS9 was investigated in more detail using reverse genetics. A lacs9 mutant line was identified and used to compare rates of plastid-associated LACS activity between it and chloroplasts from wild-type leaves. Despite a 90% reduction of LACS activity in lacs9-1 plastids, growth, appearance, and lipid composition were unaffected (Schnurr et al., 2002). This apparent paradox is explained by measurement of carbon flux: the remaining 10% of LACS activity is sufficient to maintain synthesis of cellular membranes and other sinks for activated fatty acids in young Arabidopsis tissues (Schnurr et al., 2002).

The identity of the isozyme responsible for residual plastid envelope LACS activity in lacs9 mutants remains elusive. LACS8 is 78% similar to LACS9. However, recent findings found LACS8 targeted to ER membranes (Dunkley et al., 2006; Zhao et al., 2010). Indeed, homozygous lacs8 lacs9 double mutant plants were phenotypically normal, ruling out the possibility that LACS8 accounts for the residual 10% LACS activity in chloroplasts (Zhao et al., 2010). These studies did not confirm the precise role of LACS8, but did demonstrate that the loss of LACS8 activity does not affect the supply of acyl-CoAs used for TAG synthesis in Arabidopsis seed, and suggest that LACS8 and LACS9, despite their homology, have undergone divergent functional evolution.

In certain microorganisms, some fatty acid metabolic pathways have evolved to use acyl carrier protein (ACP) as the acyl acceptor in the process of fatty acid activation, rather than CoA (Rock and Cronan, 1981; Gangar et al., 2001). Arabidopsis (Koo et al., 2005) and species of Synechocystis and Synechococcus (Kaczmarzyk and Fulda, 2010) also contain AAE enzymes that specifically utilize ACP. While Arabidopsis obviously has several other enzymes specific for CoA, the cyanobacteria rely on their acyl-ACP synthetases for all reactions that require activated fatty acids. The Arabidopsis enzyme AAE15 is a plastid-localized acyl-ACP synthetase. AAE15 and its close relative AAE16 are most strongly expressed in senescing, dehydrated, or cold-stressed tissues, suggesting a role for membrane acyl editing (Koo et al., 2005); likewise, gene knockout analysis in cyanobacteria suggests that their acyl-ACP synthetases exist in part to recapture free fatty acids released from membrane lipids (Kaczmarzyk and Fulda, 2010).

Cutin synthesis

Cutin and waxes are critically important to plant cell structure and function. Cutin is a lipophilic barrier primarily made up of mono- and dihydroxy long-chain fatty acids and related derivatives. The presence of carboxyl groups and hydroxyl groups facilitates formation of the complex and highly crosslinked polyester structures typical of cutin. Cuticular wax is a mixture of compounds derived from long- and very long-chain fatty acids, including alkanes, wax esters, aldhehydes, ketones, and primary and secondary alcohols. The wax components are embedded within and overlay the surface of the cutin. Together, cutin and cuticular waxes are often called the cuticle. Plant morphology is largely dependent on the timing of cuticle synthesis and deposition. The cuticle also plays an important role in the protection of plants from biotic and abiotic stresses. Because the cuticle is assembled from activated fatty acids, the LACS enzymes (Clade I in Figure 1) is an attractive target for many labs investigating cutin polymer synthesis. Until recently, analysis of cutin in particular had been difficult to approach due to the complex cross-linked structures of these compounds. Genetic analysis has begun to reveal several interesting new mutants that are defective in various steps of the biosynthetic pathways; these discoveries have shed new light on how cutin and waxes are made in plant cells (reviewed in Kunst and Samuels, 2009; Nawrath, 2006; Pollard et al., 2008).

LACS2 has received a great deal of attention in recent years, in large part due to its importance in cutin biosynthesis. Reverse genetic studies with LACS2 detected defects in the cuticular barrier of lacs2 leaves, among other defects. Abaxial surface cutin layers of lacs2-1 leaves were thinner than that of wild-type (Schnurr et al., 2004). The lacs2-1 plants also shared many pleiotropic phenotypes with other cutin mutants including reduced seed production, and lower rates of seedling germination and establishment (Lolle et al., 1997; Sieber et al., 2000; Kurdyukov et al., 2006a,b). Additional insight into the potential role of LACS2 in cutin production came from in vitro enzyme assays using 16-hydroxypalmitate, a well known precursor of some types of cutin monomers. Among several LACS enzymes tested, only LACS2 effectively used 16-hydroxypalmitate as a substrate (Schnurr et al., 2004).

The higher permeability of the cuticle in lacs2 mutants also alters the ability of mutant plants to respond normally to biotic and abiotic stresses. In a study by Bessire et al. (2007), a defective allele of LACS2 allowed increased diffusion of signals and effector molecules across the cuticle, resulting in a strong increase in resistance to infection by the necrotrophic fungus Botrytis cinerea. These findings led the authors to suggest that the defective cuticle barrier causes an altered perception of the plants’ environment leading to physiological changes (Bessire et al., 2007). Another group found that LACS2 is defective in a mutant called sma4 (symptoms to multiple avr genotypes 4), which was also highly resistant to B. cinerea. Interestingly, sma4 also displayed increased sensitivity to, and severe disease symptoms resulting from, infection by normally avirulent strains of Pseudomonas syringae pv tomato (Tang et al., 2007). Other mutants with defects in cutin integrity also display increased sensitivity to infection by P. syringae (Xiao et al., 2004). A change in cutin ultrastructure, such as that found in sma4, could cause leakage of cell fluids into the substomatal chamber and alleviate the water stress that typically limits infection by the P. syringae (Wright and Beattie, 2004; Tang et al., 2007).

Recent studies examining the molecular connections between osmotic stress, nutrient sensing, and lateral root formation also revealed an unexpected role for cutin integrity in general, and LACS2 specifically. Plants grown in culture, under mild osmotic stress conditions, generally repress lateral root formation (Deak and Malamy, 2005; MacGregor et al., 2008). Osmotic stress perception by the root initiates an abscisic acid–dependent signal to aerial tissues, leading to a reduction in their permeability. In turn, sucrose uptake from the media is reduced, ultimately leading to a repression of lateral root formation. These authors found that mutations that increase permeability in the cutin of aerial tissues also cause reductions or elimination of the repression of lateral root formation. Analysis of one such mutant, lrd2 (lateral root development 2), linked the increased lateral root formation phenotype to a point mutation in LACS2 (Deak and Malamy, 2005; MacGregor et al., 2008).

Production of cuticular waxes

Parallel to the recent examinations of cutin biosynthesis, many studies over the past 20 years attempted to dissect the genetics and biochemistry of cuticular wax biosynthesis. Wax production requires a large number of different enzyme activities working in concert to produce a suite of different fatty acid-derived compounds (Kunst and Samuels, 2009). Although many details are still unknown, wax synthesis is fed by very long-chain fatty acyl-CoA (VLCFA-CoA) esters produced from the fatty acid elongase complex, which elongates typical C18 fatty acids to chain lengths of C20–C30. At that point, the pathway splits into two branches known as the decarbonylation and fatty acyl reduction pathways. Previous studies showed that the cer8 mutant has elevated amounts of C26–C30 free fatty acids in its stem and leaf cuticular waxes (Jenks et al., 1995; Goodwin and Jenks, 2005). Recently, the cer8 locus was mapped, and found to encode LACS1 (et al., 2009). Additional mutant alleles of lacs1 have also been identified and characterized in other laboratories (Weng et al., 2010; D. Jessen, J. Knüfer, M. Hoppert, A. Polle, A. Olbrich and M. Fulda, unpublished data). The total wax load on lacs1 mutant stems is modestly reduced, primarily due to large reductions in the main components of the decarbonylation pathway, while free fatty acids (chain length > C24) are increased relative to wild-type (et al., 2009; Weng et al., 2010). Characterization of LACS1 enzyme activity showed that C30 and C16 fatty acids were the two best substrates, thus effectively explaining the decrease in C16 cutin monomers (see below) and C29 wax decarbonylation products, and the increase in saturated C30 fatty acids that would presumably serve as the precursors to the C29 components (et al., 2009).

LACS1 does not contain a closely related orthologue within the Arabidopsis LACS gene family. Its closest relative is LACS2, which shares approximately 65% amino acid identity with LACS1. Both genes (along with LACS3) are highly expressed in young stem epidermal tissues (Suh et al., 2005) and share other gene expression characteristics (Kosma et al., 2009). A gene orthologous to LACS3 is the third most highly expressed gene in the epidermis of citrus fruit (Matas et al., 2010), suggesting that various LACS proteins may perform important roles in cuticle production in many tissue and organ types in many plant species. Double-mutant lines of Arabidopsis lacking functional LACS1 and LACS2 were generated to search for additive effects of the two mutations. Double mutants had strongly altered wax precursor and cutin monomer profiles (et al., 2009; Weng et al., 2010). For example, lacs1-1 lacs2-3 stems had only 17% of the amount of cutin monomers, and 33% of the total stem wax load of wild-type plants (et al., 2009). Double mutant lines had reduced fertility, leading to severe reductions in seed set, and the seeds that were produced germinated poorly relative to wild-type or either single mutant parent (Weng et al., 2010). Many pleiotropic phenotypes not found in either single mutant are present in lacs1 lacs2 double mutants, including increased transpiration rates (leading to increased sensitivity to drought stress), and abnormal flower development (et al., 2009; Weng et al., 2010). These observations indicate that LACS1 and LACS2 act together to maintain the functions of the cuticle barrier, by proper maintenance of both the cutin and wax biosynthetic pathways. These findings suggest more genetic overlap in these two pathways than previously considered.

Lipids of the pollen coat

Lipids are also a necessary part of the chemical environment required for proper pollination and fertilization (Wolters-Arts et al., 1998). Pollen coat, or tryphine, contains a diverse set of lipids and proteins. During the late stages of pollen development, tapetum cells undergo programmed cell death and secrete their tryphine components, thus forming a complex lipid coat on the outer surface of pollen grains (Hsieh and Huang, 2005; Ma, 2005). Upon contact between compatible pollen and stigma, tryphine components contact the surface of the stigma and establish a contact zone. Water is channeled to the pollen through this zone, leading to pollen hydration, germination, and pollen tube growth into the stigma. Many cer mutants (and others like them) interfere with the synthesis of VLFCAs or derivatives and display conditional male sterility phenotypes (reviewed in Samuels et al., 2008). In many cases, growth under increased humidity restores fertility, likely due to substitution for pollen hydration caused by the improper composition of tryphine.

Initial studies revealed that several members of the LACS clade (Shockey et al., 2002) and other members of the AAE superfamily (Shockey et al., 2003) are expressed in flowers of Arabidopsis. Fulda and colleagues (D. Jessen, J. Knüfer, M. Hoppert, A. Polle, A. Olbrich and M. Fulda, unpublished data) have extended this analysis to look more closely at the roles of LACS activity in pollen coat formation. Pollen isolated from lines lacking LACS1 had lower levels of most pollen lipid components. Pollen from lacs1 plants was normal in appearance, yet had moderately reduced fertility. In contrast, pollen of lacs4 mutants showed a significant overaccumulation of many tryphine components. Quite surprisingly, though, pollen from lacs4 mutants displayed gross morphological alterations, yet maintained near-normal levels of fertility relative to wild-type. lacs1 lacs4 double mutant lines were completely sterile under normal conditions, but could be rescued by high humidity. Biochemical studies of the tryphine showed severe reductions in wax components in pollen from the lacs1 lacs4 double mutant. Paradoxically, addition of the lacs1 mutation into the lacs4 mutant background restored normal pollen morphology. The restoration of morphology suggests that during pollen development, LACS1 acts upstream of and produces an intermediate that must be acted upon by LACS4. Buildup of this intermediate in lacs4 lines effects the morphology of pollen grains, yet does not impede normal pollen-stigma interactions and other downstream fertilization events (D. Jessen, J. Knüfer, M. Hoppert, A. Polle, A. Olbrich and M. Fulda, unpublished data). The identity of this intermediate is not yet known. In any case, these results indicate that LACS1 and LACS4 act cooperatively in the synthesis of pollen coat lipids, similar to the situation seen with LACS1 and LACS2 in the synthesis of cutin and cuticular waxes.

Other AAE proteins have also been implicated in pollen lipid biosynthesis as well. Wang and Li (2009) identified two cDNA clones that are expressed primarily in anthers of cotton. Both proteins are most closely related to LACS4. Qualitative real-time PCR, promoter:GUS expression analysis, and in situ hybridizations revealed strong gene expression in several cell types of cotton anthers during the early stages of pollen development, including pollen mother cells and microspores. Down-regulation by RNAi led to reductions in LACS enzyme activity levels and long chain acyl-CoA pool sizes in whole anthers. Male sterility was observed in most transgenic lines, likely due to abnormal microsporogenesis during early anther development (Wang and Li, 2009).

Fatty acid β-oxidation

One of the other most well-known pathways of fatty acid metabolism is the catabolic process of β-oxidation. As a highly reduced form of carbon, fatty acids are rich in energy. Oxidation of acetyl-CoA by the TCA cycle in mitochondria is an essential pathway for conversion of this energy into a usable form (ATP). The acetyl-CoA is derived from long chain acyl-CoA by the enzymes of β-oxidation localized in peroxisomes. In plants, this process is especially important during periods of rapid lipid mobilization (Fulda et al., 2002; Graham, 2008), and in starch-depleted tissues after periods of extended darkness (Kunz et al., 2009).

Visual inspection of the amino acid sequences of the Arabidopsis LACS enzymes indicated that both LACS6 and LACS7 contained peroxisome targeting sequences (PTS). LACS6 contained a type 2 PTS (PTS2) near its N-terminus. LACS7 was unusual in that it contained both a functional PTS2 and a prototypical type-1 PTS tripeptide–S-K-L at its C-terminus. The PTS2 in LACS6 and both types of PTS in LACS7 were functional, and accurately targeted transiently expressed LACS:fluorescent protein fusions to peroxisomes of onion epidermal cells (Fulda et al., 2002). The presence of both types of PTSs on a single protein may help to ensure rapid and efficient import into peroxisomes, even during periods of very high protein flux, during which access to the peroxisomal receptors may be limiting. Peroxisomal import of LACS7 was further confirmed by yeast two-hybrid analysis. The C-terminus of the LACS7 protein interacts strongly with the tetratricopeptide repeat (TPR) domains of PEX5, the peroxisomal PTS1 receptor (Bonsegna et al., 2005). No interaction was observed between LACS7 and PEX7, the PTS2 receptor, but the bait construct design may have obscured access to the N-terminus of LACS7, thus blocking the putative PTS2 (Bonsegna et al., 2005).

Both LACS6 and LACS7 are up-regulated during periods of rapid lipid mobilization (Fulda et al., 2004). Mutational analysis revealed functional overlap between these two proteins, as T-DNA knockout lines of either gene alone were indistinguishable from wild-type, while the lac6-1 lacs7-1 double mutant was specifically defective in seed lipid mobilization and required exogenous sucrose for seedling establishment (Fulda et al., 2004). The double mutant bore some phenotypic similarities to other mutants defective in lipid mobilization and seedling germination and/or establishment, especially those mutants deficient in the ABC transporter called either PXA1, PED3, or CTS1 (Zolman et al., 2001; Footitt et al., 2002; Hayashi et al., 2002). Comparison of the phenotypes and metabolic deficiencies in these mutants led Fulda et al. (2004) to propose two possible models by which LACS6/LACS7 and PXA1 contribute to fatty acid β-oxidation, a long-debated and not yet resolved issue. Recent findings using human and yeast homologs of PXA1 provide strong evidence that ABC transporters act upon acyl-CoAs, rather than free fatty acids (van Roermund et al., 2008), findings that may help to guide future studies in plants.

It is now established that a number of AAE proteins from other clades can also activate long-chain fatty acids, at least in vitro and/or when expressed in E. coli. As discussed in the next section, it remains a challenge to identify the physiological substrates and biological roles of these proteins. It may be some time before we know the full complement of long-chain acyl-CoA synthetases in Arabidopsis and other plants.

The difficulties in identifying substrates for the 4CL-like proteins in Clade V

One of the ancient and now most completely characterized clades, Clade IV, contains four genes encoding isozymes of 4-coumaroyl-CoA ligase (4CL) that catalyze key reactions in phenylpropanoid biosynthesis (Ehlting et al., 1999; de Azevedo Souza et al., 2008; Hu et al., 2010). These genes, and their orthologues in poplar, rice, P. patens, have recently been reviewed (Costa et al., 2005; de Azevedo Souza et al., 2008), so we will not discuss them in detail here. The related Clade V members were named 4CL-like genes with the expectation that they might also act in phenylpropanoid synthesis or in related pathways of secondary metabolism (Costa et al., 2005; de Azevedo Souza et al., 2008). However, each of the proteins in Clade V contains a recognizable PTS1-type peroxisomal targeting sequence, indicating that none is likely to function in phenylpropanoid metabolism, which occurs in the cytoplasm and ER.

Identifying the substrates and physiological roles of the Clade V enzymes has proved to be a vexing problem that illustrates the difficulty of understanding the evolution of the AAE superfamily as a whole. Three members of the clade, At4g05160, At5g63380, and At1g20510 showed high activities with medium- and long-chain fatty acid substrates in assays of purified recombinant proteins, as did At1g62940, which has modest similarity to Clade V proteins and the 4CL enzymes. In addition, each of these complemented the E. coli fadD mutant for growth on 18:1 and/or 10:0 as sole carbon source (Kienow et al., 2008; de Azevedo Souza et al., 2009). These results suggest that the proteins are fatty acyl-CoA synthetase enzymes, even though they are highly divergent from the LACS proteins in Clade I. It is noteworthy that they do not appear to be functionally redundant with the peroxisomal fatty acyl-CoA synthetases encoded by LACS6 and LACS7, which are required for mobilization of fatty acids during germination (Fulda et al., 2004). Furthermore, biochemical and genetic evidence (Koo et al., 2006; Kienow et al., 2008) indicates that At1g20510, now named OPCL1 (3-oxo-2-(2′-[Z]-pentenyl) cyclopentane-1-octanoic acid CoA ligase), activates intermediates in jasmonic acid synthesis for β-oxidation in the peroxisome. By contrast, At1g62940, designated ACOS5 (acyl-CoA synthetase 5) appears to activate hydroxyl acids and is required for sporepollenin production in the tapetum cells of the anther (de Azevedo Souza et al., 2009). Interestingly, the Clade V subgroup containing OPCL1 contains four additional genes in Arabidopsis (At1g20480, At1g20490, At1g20500 and At5g38120) that are products of recent gene-duplication events. The rice genome contains only one member of this group, and the poplar genome one. Despite these clues from comparative genomics, considerable research effort has failed to identify a definite physiological function for any other member of Clade V in Arabidopsis or any other species (Costa et al., 2005; Schneider et al., 2005; Koo et al., 2006; de Azevedo Souza et al., 2008; Kienow et al., 2008).

The question posed by these observations, is whether OPCL1 and ACOS5 have evolved into specific metabolic roles, while also maintaining the ability to recognize very different chemical structures as substrates, or alternatively, that they recognize only their physiological substrates, in planta, and that in vitro assays and the complementation of the E. coli fadD mutants are misleading indicators of their true function? Different analyses have yielded different results. One approach to this question has been to complete kinetic analysis of purified recombinant proteins and to use the kcat/Km parameter to determine the most efficiently converted substrate (from those tested) (Costa et al., 2005; Schneider et al., 2005; Kienow et al., 2008). Results from this approach suggested that At5g63380 and At4g05160 may activate intermediates of jasmonate synthesis for entry into the β-oxidation pathway (although At4g05160 exhibited a higher kcat/Km for nonanoic (9:0) fatty acid). To date, however, genetic evidence has failed to support a role for these proteins in jasmonate synthesis, relative to OPCL1 (Koo et al., 2006). Alternatively, At4g05160, At5g63380 and At1g62940 were analyzed using a recently developed computational protocol for in silico assignment of substrate preference based on homologous crystal structures (Khurana et al., 2010). All three enzymes could be predicted as either coumarate: CoA ligases or long chain acyl:CoA ligases, depending on whether the whole protein sequences or limited numbers of active site residues were included in the computations.

The strong germination and biochemical phenotypes of the lacs6 lacs7 double mutant (Fulda et al., 2004) does provide a means to better understand the physiological significance of the CoA-ligase activities in Clade V proteins against long-chain (14–18 carbons) fatty acids. lacs6 lacs7 double mutant plants do not break down seed storage lipids and require sucrose supplementation for seedling establishment (Fulda et al., 2004). Evidently, the three Clade V enzymes that show in vitro activities towards long-chain fatty acids (OPCL1, At5g63380 and At4g05160) do not alleviate this phenotype, but this may only reflect their low expression during seedling germination and establishment. By expressing cDNAs encoding each of these enzymes under control of the LACS7 (or LACS6) promoter in the lacs6 lacs7 double mutant, it should be possible to infer the physiological relevance of the observed in vitro activities by whether or not the transgenes complement the mutant phenotype. It appears that strategies like this will be required to address the questions raised by the broad substrate range of some AAE proteins when assayed in vitro. More generally, the examples in this section demonstrate the importance of taking biochemical, genetic, and cell-biology approaches in parallel to identify the physiological substrate and biochemical role of individual AAE enzymes. Of course, this is true for any gene product, but the AAE superfamily is providing particularly pointed examples in this respect.

Microbial and animal proteins do not always identify plant orthologues

One practical consequence of evolutionary mechanisms is the observation that orthologous proteins from different organisms are typically more closely related than paralogous proteins from the same organism. Thus, BLAST searches against the Arabidopsis genome using sequences of yeast fatty acyl-CoA ligases, Faa1p and Faa4p, identify the nine LACS proteins of Clade I as the closest matches, followed by the proteins AAE15 and AAE16, which are also in Clade I and are related acyl-ACP ligases (Koo et al., 2005). Sadly, BLAST searches with sequences of characterized microbial enzymes do not always result in such facile identification of the Arabidopsis orthologues. For example, benzoyl-CoA ligase proteins from Rhodopseudomonas palustris (Gibson et al., 1994) and Thauera aromatica (Schuhle et al., 2003) identify members of Clades IV, V, and VII as high matches, while the Arabidopsis benzoyl-CoA ligase in Clade IV (BZ01, At1g65880, Kliebenstein et al., 2007), ranks 30th by BLAST score. A second example is the Arabidopsis O-succinybenzoyl-CoA ligase encoded by AAE14 (At1g30520) (Kim et al., 2008). This enzyme catalyzes a step in the synthesis of phylloquinone, the one electron carrier at the A1 site of photosystem I. Arabidopsis proteins catalyzing other steps in phylloquinone synthesis have been readily identified because they are often the closest homologues to eubacterial enzymes known to carry out the corresponding reactions of phylloquinone synthesis in the cyanobacterium Synechocystis PCC6803 and menaquinone synthesis in E. coli. Unfortunately, when the bacterial AAE14 orthologue (Synechocystis CoA ligase MenE) was used as a BLAST query against the Arabidopsis genome, six AAE proteins received scores between 1.2 × 10−16 and 5.4 × 10−10, while AAE14 (score = 6.8 × 10−11), placed fourth in this ranking. A similar result was obtained with the E. coli MenE protein. However, the aae14 mutants have a seedling lethal phenotype similar to other mutants deficient in phylloquinone synthesis and it was this finding that recommended AAE14 as the MenE orthologue (Kim et al., 2008).

Convergent evolution has occurred for some AAE enzymes

It seems likely that convergent evolution has occurred several times within the AAE superfamily. The identification of ACOS5 as an acyl-CoA synthetase with activity against hydroxyl fatty acids (de Azevedo Souza et al., 2009), and the in vitro acyl-CoA synthetase activities observed for OPCL1, At5g63380 and At4g05160 are examples, although some caution must be taken in assuming similar in vivo roles based on in vitro substrate specificities. A more striking example is provided by the identification of At5g36880 (ACS) and At3g16910 (AAE7/ACN1) as acetyl-CoA synthetase enzymes of the chloroplast and peroxisome, respectively (Turner et al., 2005; Lin and Oliver, 2008). The ACS protein acts primarily to metabolize acetate generated in the chloroplasts as a product of the aerobic fermentation of pyruvate (Lin and Oliver, 2008). The ACN1 isozyme is required for conversion of free acetate in peroxisomes to acetyl-CoA, which enters the glyoxylate pathway for conversion to carbohydrate (Turner et al., 2005). ACS (Clade II) and ACN1 (Clade IV) are very distantly related in the AAE superfamily, with a protein–protein BLAST score of 4.3 × 10−12 and <20% overall sequence identity. By comparison, ACS has 59% identity to the human ACS orthologue, ACS2, while ACN1 is 45% identical to another Clade VI protein, BZO1, which acts on a structurally very different carboxyl substrate, benzoic acid (Kliebenstein et al., 2007). Nevertheless, there appears to be little doubt that both ACS and ACN1 have acetate as their sole or primary physiological substrate. Attempts have been made to predict substrates for AAE enzymes using computer programs based on both hidden-Markov models (HHM) and comparisons of amino acids considered to be crucial to substrate recognition (specificity-determining residues [SDG]) (Khurana et al., 2010). These programs may help to refine the classification of some AAE enzymes. However, they did not identify ACN1 as an acetyl-CoA synthetase. In addition, other substrates were not included in the model building, so that the substrates for AAE14/At1g30520 (O-succinyl-benzoic acid) and BZO1/At1g65880 (benzoic acid) could not be predicted; instead these last two enzymes were classified as long-chain acyl-CoA synthetases.

Convergent evolution of function, when coupled with detailed structural information on the proteins, provides fascinating insights into evolutionary processes, and the relationship of protein structure to enzyme function. Unfortunately, it also reduces our ability to use sequence analysis and genomics techniques to identify orthologues and provide a comprehensive map of biochemistry onto the AAE superfamily of proteins. For at least some of the AAEs, phylogenetic relationships may not be a good guide to enzyme function.

Conclusions and Perspectives

Members of the large superfamily of AAE proteins in Arabidopsis are highly diverged in terms of amino acid sequence, but can nevertheless be assigned to the family, based on a set of strongly conserved motifs. The same is true of members of the adenylate-forming proteins in other organisms (Gulick, 2009). Available crystal structures (Gulick, 2009; Hu et al., 2010) explain the roles of these conserved motifs in terms of the unique two-step mechanism of these enzymes. The substrate binding site is not highly conserved and available evidence indicates that the sequence and structure in this region is amenable, during evolution, to alterations that can accommodate a new substrate. Some family members may have substrate binding sites that accommodate a range of structurally diverse carboxyl substrates (Gulick, 2009). This proposed flexibility of the substrate binding site could explain the convergent evolution among long-chain acyl-CoA synthetase and acetyl-CoA synthetase enzymes. It may also underlie the difficulties associated with identifying the true physiological substrates and biological roles of some of the AAE proteins. In this respect, the publication of the first AAE crystal structure from plants, for 4-coumarate:CoA ligase from poplar, represents the first possibility to map chemical structures into the substrate-binding pocket and explain substrate specificities of 4CLs, and perhaps other types of AAE enzyme as well, based on substrate interactions with residues lining the pocket (Hu et al., 2010; Khurana et al., 2010). As structures become available for more plant AAE proteins (preferably complexed with their substrates), they will be important tools to help answer the vexing questions raised by the results of biochemical, genetic, and other approaches.


Like everyone else in the community, we owe a huge debt of thanks to the Arabidopsis Genome Initiative, and to the many other research groups that have contributed to the development of resources for the Weed. We thank Vance Kelly and Catherine Mason for technical assistance. Research on lipid metabolism in our laboratories is funded the United States Department of Agriculture, Current Research Information System project no. 6435-41000-106-00D to J.S., and by grants from the U.S. National Science Foundation (grants MCB-0420199 and DBI-0701919), the Agricultural and Food Research Initiative Competitive Grant no. 2010-65115-20393 from the USDA National Institute of Food and Agriculture, and by the Agricultural Research Center at Washington State University, to J.B.