Introduction
- Top of page
- Summary
- Introduction
- Materials and Methods
- Results
- Discussion
- Acknowledgements
- References
The CCAAT box is a widespread regulatory sequence found in promoters and enhancers of a large number of genes (Bucher, 1990; Li et al., 1992; Mantovani, 1998). The functional importance of the CCAAT box, as a positive promoter element, has been well established in different systems (reviewed in Mantovani, 1998). Many DNA-binding proteins have been found to bind to CCAAT boxes (Li et al., 1992; Maity & Crombrugghe, 1998; Mantovani, 1999). NF-Y (also termed CBF) is a ubiquitous CCAAT-specific binding factor, which has a high affinity and sequence specificity for the CCAAT sequence (Li et al., 1992; Bellorini et al., 1997; Maity & Crombrugghe, 1998; Frontini et al., 2002).
The NF-Y CCAAT-specific binding factor is a heterotrimer composed of three subunits, all of which are essential for CCAAT binding. The three subunits are referred to as NF-YA (also known as CBF-B and HAP2), NF-YB (also CBF-A and HAP3) and NF-YC (CBF-C, HAP5) throughout this manuscript. The NF-Y heterotrimer is constructed by association of NF-YA with a tight dimer formed from NF-YB and NF-YC. This dimer produces a protein structure similar to the Histone Fold Motif, and it is with this complex surface that NF-YA associates. The resulting trimer has been shown to have a high affinity for DNA (Mantovani, 1999; Gusmaroli et al., 2001, 2002; Frontini et al., 2002; Romier et al., 2003). Genes encoding NF-Y subunits have been isolated from various organisms. In contrast to the situation in yeast and most vertebrates, in which all the subunits are encoded by single copy genes, multiple and distinct genes for each subunits have been identified in plant genomes (Edwards et al., 1998; Gusmaroli et al., 2001, 2002). Gusmaroli et al. (2001, 2002) identified 29 NF-Y subunit genes in Arabidopsis thaliana, including 10 NF-YAs, 10 NF-YBs and nine NF-YCs. A search of the GenBank database led to the identification of five independent NF-YA homologous genes, 10 NF-YB genes and 10 distinct NF-YC genes in the rice genome (J. Yang, unpublished data).
Duplication is a prevalent feature of plant genomes and many genes are found in tandem arrays or in duplicated segmental clusters (Cronk, 2001). The duplicate genes can arise from tandem duplications or from polyploidization events (Lawton-Rauh, 2003). Three potential evolutionary fates of duplicated genes have been suggested (Ohno, 1970; Lynch & Conery, 2000): (i) one copy of the pair simply becomes silenced by degenerative mutations after gene duplication (non-functionalization); (ii) one copy acquires a novel, beneficial function and becomes preserved by natural selection (neofunctionalization), with the other copy retaining the original function; (iii) both duplicates experience loss or reduction of expression for different subfunctions by degenerative mutations and establish distinct complementary functions. The combined action of both gene copies is necessary to fulfil the requirements of ancestral genes (subfunctionalization) (Hughes, 1994; Force et al., 1999). Questions concerning whether duplicate genes undergo different evolutionary patterns following duplication, and what the factors are that determine the fate of duplicate genes, are currently under intense research (Zhang et al., 2003).
There is often an acceleration of the rate of evolution following gene duplication (Li, 1985; Ohta, 1993, 1994; Lynch & Conery, 2000). Studies of several gene families have indicated that natural selection accelerates the fixation rate of non-synonymous substitutions shortly after a duplication event, presumably to adapt those proteins to a new or modified function (Zhang et al., 1998; Bielawski & Yang, 2003). However, an accelerated non-synonymous substitution rate could also be driven by relaxation, but not complete loss, of selective constraints. Here, duplicated proteins evolve under relaxed functional constraints for some period of time, after which functional divergence occurs when formerly neutral substitutions convey a selective advantage in a novel environment or genetic background (Bielawski & Yang, 2003). Meanwhile, degenerative mutations in regulatory subfunctions may also be accelerated under relaxed selective constraints and lead to subfunctionalization of duplicate genes.
For protein coding genes, the traditional approach to inferring the magnitude of selective constraint and positive selection is comparing the non-synonymous (dN) and synonymous substitution (dS) rates (ω = dN/dS), with ω < 1.0, ω≈ 1.0, and ω > 1.0 indicating purifying selection, neutral evolution and positive selection, respectively. However, the influences of selection on molecular evolution can also be evaluated by comparing the conservative and radical substitution rates in amino acid sequences. An amino acid substitution can be classified as either conservative or radical, depending on whether it involves a change in a certain physicochemical property of the amino acid (Zuckerkandl & Pauling, 1965; Dayhoff et al., 1972; Zhang, 2000). It is proposed that those substitutions which tend to conserve amino acid physicochemical properties, termed conservative substitutions, are more common than those substitutions which cause large changes in physicochemical properties, termed radical substitutions (Clark, 1970; Dayhoff et al., 1972). This difference in quantity is usually explained by a higher intensity of purifying selection on radical mutations than on conservative mutations. A significantly higher rate of radical non-synonymous substitution than conservative substitution has been taken as evidence for positive Darwinian selection on radical substitutions (Hughes et al., 1990, 2000; Zhang, 2000; McClellan & McCracken, 2001).
In this study, the likelihood-ratio test was used to examine the amino acid substitution rates of duplicate NF-Y genes in the A. thaliana and rice genomes. The selective influences on the evolution of plant NF-Y genes were evaluated by comparing the conservative and radical substitution rates. We address three related sets of questions. (i) How are these genes related to each other? Do the duplicates of plant NF-Y genes evolve at the same rates at the amino acid level following duplication? (ii) If the duplicates have evolved asymmetrically, do they exhibit similar amino acid substitution patterns in different plant genomes? What are the major factors that are responsible for the asymmetric divergence? (iii) Is the asymmetric evolution of duplicate NF-Y genes in gene sequences coupled to asymmetric divergence in gene functions?
Materials and Methods
- Top of page
- Summary
- Introduction
- Materials and Methods
- Results
- Discussion
- Acknowledgements
- References
The amino acid and nucleotide sequences of the NF-Y subunits in the Arabidopsis thaliana and rice (Oryza sativa ssp. japonica) genomes were compiled by searching the GenBank database using BLASTP, PSI-BLAST and TBLASTN algorithms, respectively, with the filter setting as default and expectation cutoff of 1.0. The amino acid sequences of the HAP2 (accession number P06774), HAP3 (accession number P13434) and HAP5 (accession number Q02516) subunits from yeast (Saccharomyces cerevisiae) were used as queries. The accession numbers for all sequences are shown in Table 1. The NF-YB genes from other plants were also retrieved from GenBank in order to conduct a phylogeny-based comparison of the evolutionary patterns between different types of NF-YB subunits. The accession numbers for these sequences are shown in Fig. 3 later.
Table 1. Accession numbers of the NF-Y sequences used in this study | Organism | Subunit |
|---|
| NF-YA | NF-YB | NF-YC |
|---|
| Arabidopsis thaliana | NM_121287 (At5g12840) | NM_100774 (At1g09030) | NM_100768 (At1g08970) |
| NM_112983 (At3g20910) | NM_130348 (At2g47810) | NM_104356 (At1g54830) |
| NM_104294 (At1g54160) | NM_126937 (At2g13570) | NM_104496 (At1g56170) |
| NM_112256 (At3g14020) | NM_117534 (At4g14540) | NM_125742 (At5g63470) |
| NM_101621 (At1g17590) | NM_124138 (At5g47640) | NM_114718 (At3g48590) |
| NM_105941 (At1g72830) | NM_115194 (At3g53340) | NM_124430 (At5g50480) |
| NM_179402 (At1g30500) | NM_179974 (At2g38880) | NM_124429 (At5g50470) |
| NM_179904 (At2g34720) | NM_179946 (At2g37060) | NM_122673 (At5g27910) |
| NM_120734 (At5g06510) | NM_124141 (At5g47670) | NM_124431 (At5g50490) |
| NM_111443 (At3g05690) | NM_102046 (At1g21970) | NM_123174 (At5g38140) |
| Oryza sativa | AC092262 | AB095439 | AC134235 |
| AC123974 | AB095438 | AL442106 |
| AP004746 | AB095440 | AL606641 |
| AP005454 | AC104284 | AP003610 |
| AP006458 | AC120529 | AP003546 |
| AP003266.1 | AP005392 |
| AP003266.2 | AP003852 |
| AY224530 | AP000364 |
| AP004179 | AP003875 |
| AP004791 | NM_195305 |
Amino acid sequences were aligned using Clustal X (Thompson et al., 1997). Nucleotide sequence alignments were adjusted to conform to the amino acid sequence alignments. Each subunit of the NF-Y complex contains a core region that is relatively conserved across duplicates and species, whereas the flanking regions are much less conserved with great differences in sequence identity and length. The core regions of each subunit possess all functional amino acid residues and are sufficient for subunit interactions and CCAAT binding (Gusmaroli et al., 2001; Romier et al., 2003). Therefore, the flanking sequences, which are ambiguous in the alignments, were not included in this study. All analyses in this study were based on the core region sequences of NF-YA, NF-YB and NF-YC, each with 59, 90 and 76 amino acid residues, respectively.
The phylogenetic relationships of duplicates of each subunit were inferred using the neighbour-joining method (NJ, Saitou & Nei, 1987), implemented in the program HYPHY (http://www.hyphy.org/downloads/index2.html). The Jones Empricial model of amino acid substitution was employed with invariant sites and gamma distribution rate for variable sites estimated from the data. The trees were rooted using the yeast HAP2, HAP3 and HAP5 genes as outgroups, respectively. The maximum parsimony method (MP), implemented in paup* 4.0 (Swofford, 1998), and the Bayesian method, implemented in MrBayes V2.01 (http://morphbank.ebc.uu.se/mrbayes/download.php), were also used to infer the phylogenetic relationships of the NF-YB genes from various plants. Heuristic tree search under parsimony was conducted using the TBR (tree-bisection-reconnection) swapping algorithm. The GTR + I + G model (general-time-reversible with invariant sites and gamma distributed rates for variable sites) of sequence evolution was employed in Bayesian inference, with model parameters estimated from the data. The Markov chains were run for 1 000 000 generations, and trees were sampled every 100 generations.
The likelihood-ratio test was used to examine whether the duplicates of the NF-Y genes evolved at the same rates at the amino acid level. For each duplication event (node in the tree), two different models were compared. One model assumes the same amino acid substitution rate on the two braches leading to the two duplicates but allows the rate on other branches to be different. The other model allows one of the duplicates to evolve at an independent rate. The codeml program in the PAML package (Yang, 1997) was applied to calculate the maximum likelihood values using the Jones Empricial model of amino acid substitution. Twice the log-likelihood difference was compared to a chi-square distribution. If significant, the results suggest that the two branches have evolved at unequal rates.
To investigate whether the subunits which show asymmetries in rates of amino acid divergence exhibit similar amino acid substitution patterns in different plant genomes, we calculated the goodness of fit between an observed distribution of physicochemical changes inferred from well corroborated phylogenetic trees and an expected distribution based on the assumption of completely random amino acid replacement expected under the condition of selective neutrality, using the program TreeSAAP (Woolley et al., 2003). Six amino acid properties shown to correlate with rates of amino acid replacement (Xia & Li, 1998; McClellan & McCracken, 2001) were considered: composition of the side chain, hydropathy, isoelectric point, molecular volume, polar requirement, and polarity. We also investigated the numbers of conservative, moderate, radical and very radical physicochemical changes in amino acid property, relative to the total number of theoretically possible evolutionary pathways. To deduce the relative selective influence on each physical property of each amino acid (z-score) we compared these numbers to a normal distribution. The z-scores provide information on the direction in which selection is acting, while the goodness-of-fit score (GF-score) provides information on the intensity of that selection. Taken together, these scores describe the selective influences acting on each subunit (McClellan & McCracken, 2001).
Discussion
- Top of page
- Summary
- Introduction
- Materials and Methods
- Results
- Discussion
- Acknowledgements
- References
Duplicated genes are common in genomes (Meyer, 2003), and attention has been focused on the divergence of sequences of duplicated genes and consequent divergence of functions of the proteins they encode (Conant & Wagner, 2003). A number of cases of asymmetric divergence between duplicate genes have been reported (Van de Peer et al., 2001; Kondrashov et al., 2002; Wagner, 2002). Our results show that the NF-Y duplicates in plant genomes have evolved in different patterns, with some of the NF-YB and NF-YC duplicates showing significant evidence of asymmetric evolution but not the NF-YA duplicates. This difference is probably a result of their distinct roles in trimer formation and DNA-binding. The core domain of NF-YA is less than 60 amino acids long and consists of two subdomains. The subunit-associating subdomain is responsible for the sequence specific interaction of the trimer, showing remarkable specificity for NF-YB/NF-YC among HFM dimmers (Mantovani, 1999), while the DNA-binding subdomain is implicated in specific recognition of the CCAAT element. Comparison of the duplicate NF-YA sequences revealed that all the key residues involved in DNA-binding or sequence–specific interaction are almost perfectly conserved, with only one of them showing substitution of Ala for Gly, suggesting the presence of very strong selective constraints on both subdomains. It is thus unsurprising that the NF-YA subunits do not show evidence of asymmetric evolution. We note that the NF-YA sequences used are relatively shorter than the NF-YB and NF-YC sequences, which may, to a certain extent, affect the power of statistical tests. The likelihood-ratio test becomes more conservative in a data set with short sequences and low divergence (Anisimova et al., 2001). Therefore this result should be treated with caution, particularly when the test fails to reject the null hypothesis.
Asymmetries in amino acid substitution rates were detected in both NF-YB and NF-YC subunits, indicating that some duplicates of NF-YB and NF-YC have evolved at significantly different rates following gene duplication. The amino acid changes, which have happened to different duplicates, are not evolutionarily equivalent. Most non-synonymous amino acid replacements seem to result in changes in hydropathy, polar requirement and polarity. Comparisons of magnitude classes also demonstrated that radical and very radical changes with regard to hydropathy, polar requirement and polarity happened more frequently in the NF-YC subunits than in the NF-YB subunits. This pattern was found in both the A. thaliana and rice subunits. Most interestingly, the LEC1-type and non-LEC1-type NF-YB subunits also showed significant heterogeneity in their amino acid substitution patterns with respect to hydropathy, polar requirement and polarity, suggesting that the physicochemical changes in sequences are coupled to asymmetric divergence in gene function. The amino acid properties are not changed independently. The changes in hydropathy and polarity seem to be correlated in the NF-Y subunit sequences. It is unclear whether the conservative changes in composition of the side chain, isoelectric point and molecular volume in the NF-Y subunits are accompanied by changes in other amino acid properties. The relationships of various amino acid property changes and their functional effects need to be addressed based on more rigorous biochemical characterization of the residues of the NF-Y sequences.
The unequal evolutionary rates and distinct divergence patterns indicate different selective influences on the evolution of plant NF-YB and NF-YC subunits. Some duplicates of NF-YB and NF-YC are clearly subject to stronger purifying selection, as they showed little differences in amino acid substitution rates, and a low level of divergence from each other. However, for the duplicates that showed an acceleration of evolution, the selective constraints may have been relaxed to some extent, but not completely lost. In both the NF-YB and NF-YC subunits, the preponderance of conservative and moderate amino acid replacement with regard to composition of side chain, isoelectric point, and molecular volume indicates the effects of negative selection, while the relatively high proportions of radical and very radical changes in polar requirement and polarity suggest a relaxed functional constraint on amino acid replacements in relation to these properties. One possible exception is the change in hydropathy in the NF-YC duplicate genes. A significantly higher proportion of very radical amino acid replacements with respect to hydropathy was detected in these duplicates, suggesting a directional change in amino acid sequences. This thus suggests the presence of positive selection, which has acted to promote amino acid hydropathy change to a greater extent than expected under random substitution.
It is an intriguing question whether any of the different members is able to interact with all other subunits. Gusmaroli et al. (2001, 2002) predicted that trimer formation should be possible among all members of the three subunits. At the same time, however, they showed that not all the NF-Y duplicates in A. thaliana are expressed ubiquitously. Some members are either organ-specific or developmentally regulated. We compared the amino acid substitution patterns between the ubiquitous and tissue-specific members, and failed to find any differences in sequence that are correlated with distinct expression patterns. This is possibly due to the use of only partial sequences of these genes in analyses. However, the asymmetric expression of subfunctionalized NF-Y duplicates is probably driven by the differential evolution of regulatory regions, rather than coding sequences (Conant & Wagner, 2003). The specific intermolecular interactions between different members may also play a role in determining the distinct expression patterns of duplicate NF-Y genes. Therefore, more biochemical evidence is needed to verify whether all NF-Y subunits can indeed associate.