Divergence of annual and perennial species in the Brassicaceae and the contribution of cis‐acting variation at FLC orthologues

Abstract Variation in life history contributes to reproductive success in different environments. Divergence of annual and perennial angiosperm species is an extreme example that has occurred frequently. Perennials survive for several years and restrict the duration of reproduction by cycling between vegetative growth and flowering, whereas annuals live for 1 year and flower once. We used the tribe Arabideae (Brassicaceae) to study the divergence of seasonal flowering behaviour among annual and perennial species. In perennial Brassicaceae, orthologues of FLOWERING LOCUS C (FLC), a floral inhibitor in Arabidopsis thaliana, are repressed by winter cold and reactivated in spring conferring seasonal flowering patterns, whereas in annuals, they are stably repressed by cold. We isolated FLC orthologues from three annual and two perennial Arabis species and found that the duplicated structure of the A. alpina locus is not required for perenniality. The expression patterns of the genes differed between annuals and perennials, as observed among Arabidopsis species, suggesting a broad relevance of these patterns within the Brassicaceae. Also analysis of plants derived from an interspecies cross of A. alpina and annual A. montbretiana demonstrated that cis‐regulatory changes in FLC orthologues contribute to their different transcriptional patterns. Sequence comparisons of FLC orthologues from annuals and perennials in the tribes Arabideae and Camelineae identified two regulatory regions in the first intron whose sequence variation correlates with divergence of the annual and perennial expression patterns. Thus, we propose that related cis‐acting changes in FLC orthologues occur independently in different tribes of the Brassicaceae during life history evolution.

dataset. The corresponding annotated alignment is given in table S3. The combined dataset was divided into five unlinked partitions, to apply the best fitting evolutionary model to each locus. The respective models were chosen using jModeltest v0.1.1 (Posada and Crandall 1998), relying on the Akaike information criterion (AIC).
In the Bayesian analysis based on the combined dataset four simultaneous runs with four chains each were run for 1 million generations. In each run 1001 trees was sampled. When computing the consensus tree (50% Majority Rule) the first 25% of the trees sampled were discarded (burn-in = 250). Completion of the Bayesian runs was determined using the web-based program AWTY (Nylander et al. 2007). The chosen evolutionary models and applied parameters for all Bayesian analyses (combined dataset and single marker datasets) are given in table S4. For the phylogenetic reconstruction including both co-orthologues of ADH, CHS and At2g13360 in a concatenated alignment (introns deleted, alignment length 3163 positions, table S3) a Maximum Parsimony (MP) analysis was performed through the program MEGA (Tamura et al. 2011). A consensus tree was calculated from the two most parsimonious trees. Branches with less than 50% bootstrap support (1,000 replicates) were collapsed. The Close-Neighbour-Interchange method was used for search (Nei and Kumar 2000), 10 initial trees (random addition of sequences) were generated and the MP search level determining the extensiveness of the MP search was set to 3. The tree was calculated as unrooted but displayed as rooted using Pseudoturritis turrita as outgroup. All sequences used for phylogenetic analysis were submitted to genebank. The respective accession numbers are given in the accession table (table S1).
Life cycle of A. montbretiana under controlled conditions. The life history of A. montbretiana is of particular interest because it is a sister species of the model perennial A. alpina (reference accession Pajares, hereafter referred to as A. alpina acc. Pajares). Previously A. montbretiana was described as an annual plant (Mutlu 2004), however the life cycle was not described in detail under controlled conditions. Therefore growth and development of A. montbretiana were closely monitored under greenhouse conditions. After germination the plants grew as a rosette ( S12E). The A. montbretiana plants therefore did not survive seed production and were defined as semelparous and annual.
Generation of an Arabis montbretiana X Arabis alpina F1 hybrid. In order to investigate the flowering behavior of a hybrid of an annual and a perennial plant, A. montbretiana flowers were emasculated and cross-pollinated with A. alpina acc. Pajares pollen. Since several trials of reciprocal crosses did not yield any viable seed, immature siliques were harvested for embryo culture according to a protocol adapted for A. thaliana embryo culture (Sauer and Friml 2008) with some alterations like using Arabidopsis GM instead of AM. Only one embryo out of >100 rescued developing seeds started to develop further. It started producing chlorophyll and was transferred to GM medium containing 1% of sucrose. Cultivation with the plate turned upside down and shielded from light was continued until the embryo started developing leaves. Then it was transferred to fresh medium and left to grow in a Petri dish until it had a height of approximately 5 mm. Upon reaching this height it was transplanted to a jar covered with a lid and cultivated first on GM with phytagel, then on autoclaved sand drained with liquid GM and finally on autoclaved soil which was regularly watered with autoclaved water. Eventually the plant was transferred to greenhouse conditions (long day, 16h light). As the plant had reached a suitable size it was propagated by cuttings. Crosses with A. alpina as mother did yield any viable seeds at all.
Illumina sequencing of Arabis montbretiana, contig assembly, AmFLC contig detection and analysis. For A. montbretiana an Illumina library with an insert size of 400 bp suitable for paired end sequencing was generated at the Cologne Center for Genomics/University of Cologne/Germany. 187 million reads with an average length of 113 bp were obtained and subsequently loaded into the CLC Genomics Workbench Version 4.7. Settings were adjusted for library size and a de novo assembly was performed. The assembly resulted into ~29,000 contigs and after removing contigs with clear blast hits other than plant and contigs shorter than 200 bp the assembly had a total length of ~199 Mbp (28,775 contigs). The N50 was 21.6 kB and the longest contig had a length of 312 kB. This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession LNCH00000000. The version described in this paper is version LNCH01000000. The resulting contigs were indexed as blast library which was then searched for the contig/contigs containing FLC. Depending on the analysis the obtained contig were trimmed to the region covering the FLC gene as well as the last exon of the upstream gene and the downstream sequence orthologous to what has been described as COOLAIR promoter (Swiezewski et al. 2009) in Arabidopsis, the sequence orthologous to the FLC gene including all exons and introns or only the sequence orthologous to the first intron of FLC. Sequences were aligned by GATA (Nix and Eisen 2005) for detecting duplications and rearrangements within the CDS (A. montbretiana versus A. alpina published in (Wang et al. 2009) and both orthologues against FLC (www.arabidopsis.org)), mVISTA (Frazer et al. 2004;Mayor et al. 2000) (AmFLC and FLC genomic region against A. alpina published in (Wang et al. 2009) covering partial (orthologue of) At5g10150, all intergenic sequence between (the orthologue of) At5g10150 and (the orthologue of) FLC and the COOLAIR promoter) or YASS (Noe and Kucherov 2005) (AmFLC genomic region and AmFLC cDNA). A diagram showing the structure of AaPEP1, AmFLC and FLC was generated manually using Adobe Illustrator. Percent similarity of DNA sequences was calculated by ClustalW (Goujon et al. 2010;Larkin et al. 2007;McWilliam et al. 2013). For monitoring the changes in SNP distribution among AmFLC and PEP1 across the locus AmFLC and PEP1 including all exons and introns were re-aligned by mVISTA and the resulting alignment was adjusted manually. SNPs between AmFLC and PEP1 were called. A sliding window (window size 200 bp, shift of 20 bp) was applied and the resulting data were plotted.
Details PEP1 duplication. In A. alpina acc. Pajares the duplicated segments can be differentiated by a 416 bp sequence that is only present in the copy of intron 1 adjacent to exon 1A. This 416 bp is also found in A. thaliana FLC. Furthermore, the A. alpina duplicated segments of intron 1 are differentiated by 4 SNPs of which 3 are located in the sequence 3' of the 416 bp segment while one is located just upstream of the 416 bp segment. The sequence alignments revealed that the first 950 bp of intron 1 in A. montbretiana contain the 416 bp sequence characteristic of intron 1A of PEP1 (fig .  S5D) and all 4 SNPs indicative of intron copy 1B (fig. S13). This combination suggests that AmFLC corresponds to the ancestral gene structure that contained both the 416 bp sequence and the SNPs characteristic of copy 1B in PEP1. In A. alpina duplication of the ancestral sequence followed by deletion of the 416 bp in the intron segment adjacent to exon 1B and 4 point mutations in the intron segment next to intron 1A produced the derived PEP1 structure present in A. alpina acc. Pajares (fig .  S5D).
Illumina sequencing of Arabis nordmanniana, contig assembly, AnFLC contig detection and analysis and estimation of genome size and ploidy level. The genome of A. nordmanniana was sequenced by Illumina sequencing of a 400 bp insert size library. Potential adaptor sequences were removed from the raw sequencing reads (~415 million reads) using Cutadapt (Martin 2011). The reads were further trimmed, by removing LEADING and TRAILING nucleotides with quality scores below 15 using Trimmomatic (Bolger et al. 2014). All reads shorter than 50 nucleotides were discarded. ~377 million out of the ~390 million cleaned reads matched in the de novo assembly of the A. nordmanniana genome and ~300 million of these reads could be assembled in pairs using the CLC Genomics Workbench Version 8.5. The total length of the assembly was ~342 Mbp (267228) after removal of contigs yielding blast hits other than plant or that were shorter than 200bp. The N50 was 4.9 kB and the longest contig had a length of 476 kB. This Whole Genome Shotgun project has been deposited at DDBJ/EMBL/GenBank under the accession LNCG00000000. The version described in this paper is version LNCG01000000. A blast database was generated from the contigs using the CLC Genomics Workbench Version 8.5. Searching the blast database generated from the contig assembly resulted in several blast hits to FLC or flanking genes. One contig contained the orthologue of At5g10150 as well as the partial (exon 1 until partial intron 6) A. nordmanniana orthologue of FLC (hereafter referred to as AnFLC A). A second contig could be identified as containing the remaining part of intron 6 as well as exon 6 and some sequence downstream of AnFLC A. Amplification of a fragment spanning exon 2 to 7 and sequencing of the end of the fragments confirmed that both contigs can be joined manually. Further analysis of the results of the blast search revealed a third contig that covered the orthologue of At5g10150, exon 1, intron 1, exon 2 and parts of intron 2 of the orthologue of FLC. Three more contigs were found to make up the remaining part (rest of intron 2, exon 3 to exon 7) of AnFLC. Several PCR reactions and subsequent sequencing of the PCR products were used to confirm that these contigs can be used to form a scaffold. Furthermore all four contigs were found to have overlaps (fig. S15) which may result from heterozygosity. For simplicity in the display of the complete alleles these overlaps were removed. Together these contigs make up a second copy of the orthologue of FLC hereafter referred to as AnFLC B. Aligning the manually scaffolded contigs to PEP1 and AmFLC showed that also in A. nordmanniana FLC is encoded by a single gene with 7 exons and 6 introns (alignment in table S14). Interestingly the duplication of the COLDAIR region was only found in AnFLC A and not in AnFLC B. However, it cannot be concluded if COLDAIR-like 1 (the duplicated copy) was lost in AnFLC B or if AnFLC B represents the ancestral form of AnFLC.
In order to find out if the presence of AnFLC A and AnFLC B is due to a duplication of AnFLC or the result of a whole genome duplication the genome size was estimated from the kmer-frequency distribution using a previously described method (Liu et al. 2014). In brief, the genome size can be estimated by the dividing the total number of kmers in the sequencing data by the expected kmer depth.
The expected kmer depth was obtained by first determining the abundance of individual kmers from the cleaned sequence reads using JELLYFISH (Marçais and Kingsford 2011) and then inspecting the kmer-abundance distribution (fig. S14). In order to limit the influence from low frequency kmers that are likely correspond to sequencing errors we removed all k-mers with an abundance  4. The total number of kmers in the cleaned sequence reads was 30,676,346,003 and the expected kmer-depth was found to be 44 (fig. S14). Based on these numbers the genome size was estimated to be around 697.18 Mb.
The A. nordmanniana assembly was annotated using the MAKER pipeline (Cantarel et al. 2008). As no transcriptomic sequence information was available, the genome was annotated using protein sequences from the following species downloaded from the Phytozome v10 (http://phytozome.jgi.doe.gov/pz/portal.html): A. thaliana, A. lyrata, Capsella rubella, Capsella grandiflora, Brassica rapa and Eutrema salseginium. In addition to these species, we also used proteins from A. alpina and A. montbretiana. The initial protein coding genes predicted by MAKER were filtered by performing BlastP (Altschul et al. 1997) searches (e-value threshold 1e -5 ) with the encoded proteins against the same protein database used during the annotation. Blast hits were only considered as valid if at least 60% of amino acids of the subject protein were represented in the alignment. A. nordmanniana genes without any significant hit were discarded.
Using this workflow we annotated a total of 41,610 protein-coding genes in A. nordmanniana. In order to determine the completeness of the predicted protein coding gene space, we searched the predicted proteins against a set of core-eukaryote genes (Parra et al. 2007) using BlastP (e-value threshold: 1e -20 ). Significant hits were found against 437 (˜95.4%) of the 458 A. thaliana proteins in the coreeukaryote gene set. This analysis suggests that the A. nordmanniana assembly covers a very large fraction of its gene space. The predicted proteomes of A.nordmanniana, A.alpina and A.montbretiana (personal communication) were clustered into 20840 potential gene-families using OrthoMCL (Li et al. 2003). Next, we searched groups in which the three species were represented in a 1:1:2 ratio. These groups were assumed to correspond to species-specific duplication events. In total, 4737 groups harbouring species-specific duplication were identified. The fraction of groups containing A. alpina specific duplications (145: ˜3.0%) was similar to the fraction of groups containing A. montbretiana specific duplications (160: ˜3.4%). The number of groups with A. nordmanniana specific duplications was substantially higher (4432: ˜93.6%). The genome size (approximately double the genome size of A. alpina as well as the high abundance of duplicated genes in A. nordmanniana make it likely that our accession is a tetraploid individual and that whole genome duplication rather than a single gene duplication is the reason for the presence of two copies of AnFLC. Amplification and sequencing of marker sequences. Four nuclear and two plastidic marker sequences were amplified and sequenced for use in phylogenetic reconstruction. Orthologues of the A. thaliana gene At2g13360 had been used successfully for phylogenetic reconstruction in the Brassicaceae (Duarte et al. 2010). In order to design primers specifically amplifying the locus in the Arabideae (and Pseudoturritis turrita as outgroup) the orthologue of At2g13360 was pulled out of the genome assembly of A. alpina acc. Pajares (Willing et al. 2015). Synteny of the respective genomic regions of A. thaliana and A. alpina was tested using the program GATA (Nix and Eisen 2005). When syntenie was confirmed primers were designed using the program primer3 (Rozen and Skaletsky 2000). Primer sequences, including the ones for the amplification of ITS, trnL, the trnL-F IGS, CHS and ADH are given in table S2. Marker sequences were amplified by PCR and sequenced directly or after cloning by Sanger sequencing. Details are given in supplementary methods and results.
The PCR mix contained 20-40 ng template DNA, 10 pmol of each primer, 2.5 nmol of each nucleotide, 1x My Budget or GoTaq PCR reaction buffer, 0.1 µl My Budget or Mango Taq polymerase and a final concentration of 2.5 mM MgCl2. Amplification followed a standard 3-step PCR protocol [2' initial denaturation at 95°C, 30 cycles of 30" denaturation at 95°C, 30" annealing at a primer pair specific temperature and 1' elongation at 72°C, followed by a final elongation step at 72°C for 5'].
Quality of PCR products was controlled by agarose gel electrophoresis. PCR products were purified using the Macherey & Nagel PCR Purification Kit according according to the manufacturer's protocol or the Promega Wizard SV Gel and PCR Clean-Up Kit prior to cloning (chs and adh fragments cloned into chemical competent E. coli strain JM109 using the pGEM-T vector system (Promega)) followed by Sanger sequencing using the universal T7 and SP6 primers or direct Sanger sequencing using the same primers as for the PCR reaction (forward and reverse) or the universal primers M13FP [5'-TGTAAAACGACGGCCAGT-3'] and M13RP [5'-CAGGAAACAGCTATGACC-3'].

. Relative expression of FLC orthologues in perennial A. alpina and tetraploid A. nordmanniana (allele A and B) and annual A.
montbretiana and A. auriculata. All examined orthologues are expressed before vernalization (0w = 0 weeks vernalization = 3 or 6 weeks at warm temeparture) and repressed by vernalization (12 wv = 12 weeks vernalization at 4 °C). However, only in the perennial species the FLC orthologues are derepressed after vernalization while in the annual species they stay stably repressed (12 wv + 2/3w = 12 weeks vernalization plus 2 or 3 weeks at warm temperature). Expression values are relative to the respective orthologues of PP2A or RAN3 (A. nordmanniana) and relative to 0 weeks (0w) of vernalization. AnFLC A and B were amplified together. Error bars represent standard deviation of technical qPCR replicates.