Positive selection drives adaptive diversification of the 4-coumarate: CoA ligase (4CL) gene in angiosperms

Lignin and flavonoids play a vital role in the adaption of plants to a terrestrial environment. 4-Coumarate: coenzyme A ligase (4CL) is a key enzyme of general phenylpropanoid metabolism which provides the precursors for both lignin and flavonoids biosynthesis. However, very little is known about how such essential enzymatic functions evolve and diversify. Here, we analyze 4CL sequence variation patterns in a phylogenetic framework to further identify the evolutionary forces that lead to functional divergence. The results reveal that lignin-biosynthetic 4CLs are under positive selection. The majority of the positively selected sites are located in the substrate-binding pocket and the catalytic center, indicating that nonsynonymous substitutions might contribute to the functional evolution of 4CLs for lignin biosynthesis. The evolution of 4CLs involved in flavonoid biosynthesis is constrained by purifying selection and maintains the ancestral role of the protein in response to biotic and abiotic factors. Overall, our results demonstrate that protein sequence evolution via positive selection is an important evolutionary force driving adaptive diversification in 4CL proteins in angiosperms. This diversification is associated with adaption to a terrestrial environment.


Introduction
Lignin and flavonoids are thought to play vital roles in the adaptation of plants to terrestrial environments (Rozemaa et al. 2002;Weng and Chapple 2010;Agati et al. 2013). The enzyme 4-Coumarate: CoA ligase (4CL; EC 6.2.1.12) is a key enzyme that functions in an early step of the general phenylpropanoid pathway. The protein 4CL converts 4-coumaric acid and other cinnamic acids, such as caffeic acid and ferulic acid, into the corresponding CoA thiol esters, which are then subsequently used for the biosynthesis of numerous secondary metabolites, including flavonoids, isoflavonoids, lignin, suberins, coumarins and wall-bound phenolics (Ehlting et al. 1999;Saballos et al. 2012). The 4CL gene family is typically small. The 4CL family has 4 members in Arabidopsis (Hamberger and Hahlbrock 2004), 5 members in rice (Gui et al. 2011;Sun et al. 2013), and 4 members in soybean (Lindermayr et al. 2002). 4CL isoforms with different substrate specificities may direct the flow from general phenylpropanoid metabolism into the different pathways for specific end products (Souza et al. 2008).
In dicots, 4CLs can be divided into two distinct groups: class I and class II. The disruption of 4CL expression has demonstrated that class I 4CLs participate in lignin formation, while class II 4CLs impact flavonoid metabolism (Lee et al. 1997;Hu et al. 1998;Ehlting et al. 1999;Harding et al. 2002;Nakashima et al. 2008). The remarkable functional diversity of 4CL suggests that it may be subject to positive Darwinian selection. However, how the 4CL genes evolve and functionally diverge and whether natural selection plays a role in their evolution have been poorly studied. In this study, we analyzed nucleotide divergence in the 4CL genes from 16 species and used likelihood methods with various evolutionary models to investigate potential patterns of positive selection.

Sequence data collection
All known and reported 4CL protein-coding sequences from dicots, monocots, and gymnosperms (loblolly pine) were retrieved from the National Center for Biotechnology Information (NCBI). In total, 42 4CL protein sequences from 16 species were collected and are listed in Table S1.

Phylogenetic analysis
The 4CL protein-coding sequences were aligned using the program CLUSTALW implemented in MEGA5 (Tamura et al. 2011) and manually edited. Highly variable regions, indels, and gaps were excluded. A phylogenetic tree was constructed using MEGA5 with the neighbor-joining (NJ) method. The reliability of the branches was evaluated by 1000 bootstrap replicates.

Test for selection
The nonsynonymous-synonymous substitution rate ratio (x = dN/dS) provides a measure of the selective pressure at the protein level, where a x of 1, <1, or >1 indicates neutral evolution, purifying selection, or positive selection, respectively. The hypothesis of positive selection was tested using the CODEML program in the PAML v4.3b package (Yang 2007). Three approaches, branch, site, and branch-site models, incorporated into the program were used. In the lineage-specific selection analyses, we employed the recently developed dynamic programming procedure to search for the optimal branch-specific model that had a likelihood equal to or close to the global maximum likelihood for all of the possible models (Zhang et al. 2011). In the site-specific selection analyses, the dataset was fitted to three pairs of codon substitution models (M2a vs. M1a, M3 vs. M0, and M8 vs. M7). The branch-site model A was used to detect positively selected sites along the branches that showed elevated x ratios.
The sites under positive selection were identified by the Bayes Empirical Bayes (BEB) approach.

Results and Discussion
Angiosperm 4CL gene phylogeny The conserved protein-coding sequences of 42 4CLs from 16 species were used to reconstruct a phylogenetic tree. Analysis revealed that all of the 4CL genes fell into one of two general groups: A and B (Fig. 1). Group A contains representatives from all of the available dicots, including verified 4CL sequences from Arabidopsis, poplar and soybean. The monocot 4CL isoenzymes in group B form a highly supported monophyletic group and are thus separated from the dicot isoforms. The gymnosperm 4CLs, the loblolly pine isoforms Lp4CL1 and Lp4CL2, also formed a separate cluster that was closest to the monocot isoenzymes.
The functional divergence of the 4CL gene family The dicot 4CLs can be divided into two distinct groups that are designated dicots class I and dicots class II (Fig. 1). Previous studies have demonstrated that 4CL genes in dicots class I are associated with lignin accumulation, while dicots class II 4CLs are involved in the metabolism of other phenolic compounds, such as flavonoids. For example, the genes Pt4CL1, At4CL1, At4CL2, At4CL4, Gm4CL1, and Gm4CL2 in dicots class I are involved in lignin formation (Hu et al. 1998;Ehlting et al. 1999;Lindermayr et al. 2002). However, the genes Pt4CL2, At4CL3, and Gm4CL4 in dicots class II are believed to play a role in flavonoid biosynthesis (Uhlmann and Ebel 1993;Hu et al. 1998;Ehlting et al. 1999).
The 4CLs from monocots can also be classified into two groups, which are designated monocots class I and monocots class II (Fig. 1). The 4CL genes in monocots class I are associated with lignin accumulation. For example, Pv4CL1 in monocots class I is the key 4CL isoenzyme involved in lignin biosynthesis because RNA interference of Pv4CL1 reduces the activity of extractable 4CL by 80% leading to a reduction in lignin content and a decrease in the guaiacyl unit composition (Xu et al. 2011). The Os4CL3 gene in the same group is also involved in lignin biosynthesis because suppression of Os4CL3 expression results in significant lignin reduction, retarded growth and other morphological changes (Gui et al. 2011). However, the genes in monocots class II ( Fig. 1) are likely to participate in the flavonoid biosynthetic pathway. For example, based on phylogenetic analysis, Xu et al. (2011) hypothesized that Pv4CL2 in monocots class II mainly participates in the flavonoid biosynthesis pathway in switchgrass. Recent research (Sun et al. 2013) has demonstrated that the primary function of Os4CL2 is to channel the activated 4coumarate to chalcone synthase and subsequently to different branched pathways of flavonoid secondary metabolism leading to flower pigments and UV protective flavonols and anthocyanins. The remarkable functional diversity of not only dicot but also monocot 4CLs suggests that 4CL may be subject to positive Darwinian selection.

Evolutionary patterns among lineages and among sites
To test the hypothesis that positive selection acts on 4CLs, we applied branch-specific models to the 4CL dataset. It was clear that 40RM (40 ratio model) with 40 different x ratios was the optimal branch model ( Table S2). The six branches where x was >1 were defined as branches a, b, c, d, e, and f, respectively (Fig. 1). To examine whether the x ratio for each branch was significantly greater than the background ratio, the log-likelihood values were calculated from two-ratio models that assigned the ratios x a , x b , x c , x d , x e , and x f to branches a, b, c, d, e, and f, and the ratio x 0 was assigned to all other branches. All of these two-ratio models were individually compared with the one-ratio model (M0). The one-ratio model, which assumes the same x parameter for the entire tree, yielded a log-likelihood value of -28951.48 with an estimated x 0 of 0.089 (Table 1). The low average ratio indicated the dominating role of purifying selection in the evolution of the 4CL genes. The two-ratio models for branches a, c, d, and f fit the data significantly better than the one-ratio model (Table 2), resulting in the rejection of the null hypothesis that the 4CL genes evolved at constant rates along the branches. To test whether the six x ratios were significantly higher than 1, we calculated the log-likelihood values using the two-ratio models with x a , x b , x c , x d , x e , and x f fixed to 1 ( Table 1). The likelihood ratio tests were also implemented for comparing each two-ratio model and its corresponding fixed two-ratio model. The likelihood ratio tests in Table 2 revealed that the x ratios for branches a, b, c, d, e, and f were not significantly greater than one. We therefore conclude that the evolution of the .50 x 0 = 0.089, x a = 3.463 Two ratios (fixed x a = 1) 1 À28949.58 Because the branch model test averages the x ratios across all of the sites and is a very conservative test for positive selection, we applied site-specific models to the 4CL dataset. The log-likelihood values and the parameter estimates under models with variable x ratios among the sites are listed in Table 1. Two site classes (M3, K = 2) fit the data significantly better than one site class (M0) by 690.88 log-likelihood units revealing significant variation in the selective pressure on the sites. However, none of the site-specific models allowed for the presence of positively selected sites, such as M2a (selection), M3 (discrete), and M8 (beta and x), suggesting the existence of positively selected sites with x > 1. The majority of the sites in the 4CL sequences appear to be under strong selective constraints.

Evidence for positive selection on lignin-related 4CL genes
Positive selection is difficult to detect because it often operates episodically on just a few amino acid sites and purifying selection may mask the signal. Branch-site models can detect positive selection that affected a small number of sites along prespecified lineages. We used branchsite model A to test the hypothesis. As detailed in Table 3, branch-site model A using branch c as the foreground branch (MAc) resulted in a significantly better fit to M1a (2DlnL = 25.56, df = 2, P < 0.00) and to null model A for branch c (2DlnL = 11.58, df = 1, P < 0.00) ( Table 3). This result also suggested that 5.1% of amino acids are under positive selection in lineage c with x = 15.95 (Table 1). Branch-site model A using branch d as the foreground branch (MAd) provided a significantly better fit to M1a (2DlnL = 17.2, df = 2, P < 0.00) and the null model A for branch d (2DlnL = 6.58, df = 1, P < 0.00) ( Table 3). This result also suggested that 5.3% of the protein sites are under positive selection in lineage d with x = 14.873 (Table 1). When the analysis was repeated with branch f as the foreground branch (MAf), model A was much more realistic and fit the data significantly better than M1a (2DlnL = 31.82, df = 2, P < 0.00) and the null model A for branch f (2DlnL = 9.92, df = 1, P < 0.01) (Table 3), which suggested that 4.5% of the amino acids are under positive selection in lineage f with x = ∞ (Table 1). Model A using branch a as the foreground branch (MAa) did not fit the data better than the two null models in test 1 and test 2 (Table 3). These evidences are sufficient to support the positive selection hypothesis on lineages c, d, and f.
Based on the BEB method, four and six candidate sites for positive selection were identified in dicots and monocots, respectively (Table 1). These positively selected sites are labeled in Figure 2. Sites 181I, 202S, 211S, 223L, 234K, and 239K are located in the substrate-binding pocket, and 379M and 423T are located in the in catalytic centers. Sites 65L, 69E, 79V, and 82C are located between the conserved sequence motifs A2 and A3, which form a phosphatebinding loop. Site 291S is close to motif A6, which is important for the formation of a stable tertiary structure. Thus, amino acid substitutions in these positively selected sites in the 4CL genes might influence the 4CL substrate specificity, activity, or secondary structure, which would in turn have a profound effect on 4CL's function.  We have demonstrated that 4CL genes in branches c and f, which are associated with lignin accumulation, are under positive selection. Interestingly, positive selection is also detected at the At4CL genes in branch d. However, the role of these proteins in lignin formation is similar to other proteins from dicots class I (Hu et al. 1998;Ehlting et al. 1999). We hypothesize that positive selection on the At4CL genes may be related to functional specialization.

Selective constraints on flavonoid-related 4CLs in dicots
The 4CL genes involved in flavonoid biosynthesis (dicots class II and monocots class II, Fig. 1) have been largely conserved during plant evolution, suggesting that they are constrained by purifying selection. Land plants evolved from green algae in the mid-Ordovician over 450 million years ago (Langdale 2008). After arriving in terrestrial environments, the pioneering land plants were confronted with several major challenges such as ultraviolet irradiation, desiccation stress. The presence of flavonoid in the earliest land plants and the associated ability to resist UV irradiations made survival on land possible for the plants (Rozemaa et al. 2002). Flavonoid evolved prior to the lignin pathway. For example, bryophytes do not synthesize lignin, but accumulate soluble phenylpropanoids, such as flavonoids and lignans (Weng and Chapple 2010). Flavonoids accumulate in the epidermal layer of extant plants, which has been shown to absorb over 90% of UV-B radiation (Stafford 1991). These evidences suggested that the ancestral role of 4CL was to participate in the flavonoid biosynthesis and that this role was maintained in the adaption to a terrestrial environment.

Conclusions
4CLs play important roles in both lignin and flavonoid biosynthesis. 4CLs that play a role in lignin biosynthesis are subject to positive selection. This positive selection resulted in a functional divergence after the monocotdicot split approximately 200 million years ago. Positive selection could have been involved in the early stages of the evolution of the 4CL genes; 4CL rapidly evolves after speciation events. Strong purifying selection operates on the novel 4CL genes to maintain the protein's existing function. Based on the BEB method, four and six candidate sites for positive selection were identified in dicots and monocots, respectively (Table 1). Most of the positively selected sites are located in the substrate-binding pocket and the catalytic centers (Fig. 2). Therefore, amino acid replacements in these sites might imply a neofunctionalization. The result is in agreement with our findings that 4CL genes functionally diversified in angiosperms (Hu et al. 1998;Ehlting et al. 1999;Gui et al. 2011;Xu et al. 2011;Sun et al. 2013). Although several positively selected sites were detected using the branchsite model, we find that the 4CL gene family as a whole experiences purifying evolution rather than pervasive selection throughout evolution. The 4CLs involved in flavonoid biosynthesis have been largely conserved during plant evolution and maintain the ancestral role in response to biotic or abiotic factors. These findings provide deeper insights into understanding the evolutionary

Supporting Information
Additional Supporting Information may be found in the online version of this article: