Tandem duplication of FLC locus is subjected to gene loss and purifying selection in A. thaliana-related species A. arenosa and A. lyrata
Gene duplication has been known as a major mechanism for the generation of new gene functions and the origin of adaptive evolutionary novelties (Ohno, 1970; True & Carroll, 2002; Irish & Litt, 2005). Gene duplication is achieved via whole-genome duplication (WGD), local tandem duplication, and/or through retrotransposon (Wolfe, 2001; Nei & Rooney, 2005; Casneuf et al., 2006; Xiao et al., 2008). Tandemly duplicated genes are considered to be relatively younger than the genes duplicated by WGD (Rizzon et al., 2006; Kliebenstein, 2008). In plants, c. 16% of genes in A. thaliana and 14% of genes in Oryza sativa were tandemly duplicated (Rizzon et al., 2006). Tandem gene duplication occurs frequently in disease gene clusters and is mainly caused by unequal crossing over between homologous chromosomes or through unequal sister chromatid exchanges (Tilley & Birshtein, 1985; Meyers et al., 1998; Leister, 2004; Sanchez-Gracia et al., 2009). Tandem gene duplicates may get lost over time or develop new functions (Lynch & Conery, 2003) through the processes known as neofunctionalization (Force et al., 1999) and subfunctionalization (Lynch & Force, 2000). This may occur through modifications of protein-coding sequences, as well as noncoding regulatory sequences that are associated with expression diversity. Some novel adaptive traits, including heavy metal resistance in A. halleri (Hanikenne et al., 2008), boron tolerance in barley (Sutton et al., 2007) and submergence tolerance in rice (Xu et al., 2006) have been associated with tandem gene duplicates.
Our previous study indicated that two FLC alleles are present in A. arenosa and contribute to flowering time variation in A. arenosa and resynthesized allotetraploids (Wang et al., 2006a). Here we showed the structure and organization of FLC loci in A. arenosa and A. lyrata. Phylogenetic analysis among FLC homologs suggests that tandem duplication of FLC occurred before the divergence between A. arenosa and A. lyrata (Fig. 3b,c). Interestingly, the tandem duplication event results in different consequences in the two species. Only FLC duplication retains in A. arenosa, whereas the segmental duplication involving three genes (FLC in the middle) remains traceable in A. lyrata. Ka : Ks values of FLC neighboring genes (ORF25 and ORF27) in A. arenosa and A. lyrata relative to A. thaliana are 0.29 : 0.33 for ORF25 and 0.09 : 0.21 for ORF27, indicating purifying selection on these neighboring genes. In A. arenosa, regional homologous recombination between the tandem arrays may lead to deletion of two neighboring genes in the vicinity of FLC. Alternatively, A. arenosa is a naturally outcrossing autotetraploid, which may complement the loss of genes in this genomic region.
The origin and diversification of FLC duplicates is similar to that of the MADS AFFECTING FLOWERING (MAF) form monophyletic clade, which is highly related to FLC in the MADS box gene family (Alvarez-Buylla et al., 2000). Arabidopsis thaliana has five MAF genes MAF1–5 (Ratcliffe et al., 2001). MAF1 is a floral repressor, but unlike FLC, MAF1 expression is not affected by vernalization (Ratcliffe et al., 2003). Other members, MAF2–5, are tandemly arrayed within c. 24 kb in the chromosome 5. Although MAF2–3 underwent positive selection, MAF4 and 5 were subjected to purifying selection (Caicedo et al., 2009), indicating functional diversification. Moreover, MAF3–4 expression is downregulated, whereas MAF5 is upregulated under short period of cold treatment, indicating expression divergence of duplicate genes in response to vernalization (Ratcliffe et al., 2003). Therefore, tandem duplicate genes may lead to functional and expression diversity and adaptive variation.
Gene fragmentation has also been reported in Brassica oleracea genome, which diverged from A. thaliana c. 20 Mya (Town et al., 2006). Out of 177 annotated genes from a triplicated region spanning c. 2.2 Mb in the B. oleracea genome, fragments of 10 colinear genes are identified. Together, the results indicate that sequence diversification and loss are associated with the evolution of tandemly duplicated genes in many plants including Brassicaceae.
Unlike neighboring genes, FLC duplicates are very well preserved. The ratio of Ka and Ks ranges from 0.23 to 0.44, indicating a purifying selection against mutations or loss of FLC during evolution. Indeed, null FLC alleles have not been found in A. thaliana ecotypes or related species (Michaels & Amasino, 2001), suggesting that FLC is an essential gene for maintaining flowering-time variation. How and why the selection pressures act differently in the FLC and its neighboring genes resulting from the same duplication event remains an interesting question (Rizzon et al., 2006). The tandemly duplicated genes appear to undergo different selection pressures from the WGD duplicated genes because tandem duplication may result in the duplication of one component in the regulatory network, leading to imbalance (Gaut & Ross-Ibarra, 2008). The majority of tandemly duplicated genes were found to encode membrane proteins and the proteins for abiotic and biotic stresses (Rizzon et al., 2006). Membrane-coding, stress responsive, and flowering timing genes tend to be near the end of biological pathways, which is preserved after tandem duplication (Gaut & Ross-Ibarra, 2008).
The coding sequences of FLC homologues are highly conserved but their putative regulatory elements are highly diverged
Coding sequences of FLC homologues are well conserved, suggesting functional constraints for FLC that inhibit early flowering. Unlike the coding region, except for a small block (c. 340 bp) near the core promoter, the upstream regulatory regions (1.5–2.0 kb) among all FLC homologues are highly divergent (Fig. 4 and data not shown). The minimally functional promoter includes 5′UTR (109 bp for AtFLC) and is highly conserved among all FLC homologues (Sheldon et al., 2002; Wang et al., 2006a). Interestingly, a regulatory block in the 5′ regions is found in AaFLC1 but not in AaFLC2 (Fig. 4), which may contribute to the upregulation of AaFLC1 and repression of AaFLC2 in the resynthesized and natural allotetraploids (Wang et al., 2006a). This regulatory block with similar sequences is also located in the first intron of all FLC homologues except for AaFLC3. The regulatory block is shown to be responsive to FLC expression and vernalization effects in A. thaliana (Sheldon et al., 2002). The presence of an extra regulatory copy in the promoter regions of AaFLC1 may be related to the tandem duplication between AaFLC1 and AaFLC3, which involves the first exon and the first intron (Fig. 2b). This 5′upstream regulatory block of AaFLC1 may serve as an extra enhancer element for upregulation of AaFLC1 in the allotetraploids as well as in response to vernalization.
Collectively, the sequence and functional analyses of FLC in A. thaliana-related species (Wang et al., 2006a) suggest that FLC1 and FLC2 duplicates in A. arenosa and A. lyrata share the same function as FLC in A. thaliana. The presence of different 369-bp regulatory regions in FLC homologues of A. thaliana and its closely related species indicate that FLC expression is likely to be regulated differently in these species. Tandemly duplicated genes tend to be rapidly diverged in expression (Casneuf et al., 2006), and duplicated genes are easily released from original selection pressures, providing adaptive and developmental novelty. The different patterns in the regulatory sequences among FLC duplicates may represent expression diversity. If the FLC expression pattern is conserved among duplicates, dosage effects predominate. It needs to be determined empirically whether FLC duplicates exert dosage effects, developmental regulation and/or rapid responses to environment.
The novel MADS gene AaFLC3 is generated through exonization of intronic sequence
AaFLC3 is generated through exonization of intronic sequences. A substitution mutation (A→G) in AaFLC3 created a splicing acceptor site in the exon 2, followed by a new stop codon (TAA) (Fig. 5a). Exonization of intronic sequence, particularly those originating from repetitive sequences such as Alu or retrotransposable elements is found in human (Makalowski et al., 1994), rodent (Wang et al., 2005), dog (Wang & Kirkness, 2005) and many other vertebrates (Alekseyenko et al., 2007). Molecular mechanisms associated with exonization of intronic sequence have been well documented for the evolution of Alu elements inserted into intergenic region between two exons (Lev-Maor et al., 2003; Sorek et al., 2004). The overall occurrence of mutations in Alu elements is remarkably small and mostly concentrated in the 5′ and 3′ splice-site regions. Almost all exonized Alu elements in the human genome are alternatively spliced, and only a small fraction of transcripts contains new exons (Sorek et al., 2002). There is no evidence for the presence of transposable elements in AaFLC3. AaFLC3 was highly expressed in leaves of A. arenosa and allotetraploid Allo733 that was derived from hybridization between A. thaliana (Ler) and A. arenosa tetraploids (Comai et al., 2000; Wang et al., 2004).
AaFLC3 is predicted to encode 90 amino acids as a MADS-box gene (Fig. 5b). The truncated AaFLC3 is a novel MADS-box gene without K-box domain. K-box is known to be responsible for protein–protein interaction during the formation of MADS box transcription factor complex (Kaufmann et al., 2005). The absence of K-domain may affect protein complex formation, compared with original FLC protein. FLC belongs to the type II MADS-box gene, which contains all four functional domains, the MADS box, the intervening domain, the K-box, and the C-terminus (Kaufmann et al., 2005). AaFLC3 resembles the type I MADS-box gene that contains only a MADS box and a C-terminal region. However, AaFLC3 contains one intron, which is different from typical type I MADS-box genes that do not have introns (De Bodt et al., 2003). It is likely that AaFLC3 belongs to a type II MADS-box protein but may function as a type I MADS-box protein. Expression of AaFLC3 in A. arenosa and allotetraploids suggests that AaFLC3 is functional. Whether AaFLC3 affects flowering time or plays a different role remains to be investigated. We suggest that AaFLC3 MADS-box gene encodes a transcription factor for growth and development. The generation and maintenance of AaFLC3 after segmental duplication in A. arenosa may also suggest that AaFLC3 affects growth and developmental patterns that are specific to A. arenosa.