Regulatory divergence of flowering time genes in the allopolyploid Brassica napus

Polyploidy is a recurrent feature of eukaryotic evolution and has been linked to increases in complexity, adaptive radiation and speciation. Within angiosperms, such events occur repeatedly in many plant lineages. We investigated the role of duplicated genes in the regulation of flowering in Brassica napus. This relatively young allotetraploid represents a snapshot of evolution and artificial selection in progress. In line with the gene balance hypothesis, we find preferential retention of expressed flowering time genes relative to the whole genome. Furthermore, gene expression dynamics across development reveal diverged regulation of many flowering time gene copies. This finding supports the concept of responsive backup circuits being key for the retention of duplicated genes. A case study of BnaTFL1 reveals differences in cis-regulatory elements downstream of these genes that could explain this divergence. Such differences in the regulatory dynamics of duplicated genes highlight the challenges for translating gene networks from model to more complex polyploid crop species.

Many economically important crops exhibit extensive gene multiplication as a result of recent or ancestral polyploidy 1 , for example wheat (Triticum aestivum) 2 , cotton (Gossypium hirsutum) 3 , and oilseed rape (OSR, Brassica napus) 4 . The presence of multiple copies of a gene relaxes natural and artificial selective pressures on any one individual copy, facilitating the emergence of novel gene functions 5 . The resulting increase in variation can be exploited to breed crop varieties with desirable phenotypes 6 . The presence of multiple orthologues, however, hinders efforts to translate knowledge of gene function and, in particular of regulatory networks, from model to crop species. This is a consequence of not knowing which orthologue, if any, retains the same function as the corresponding gene in the model species, whether ancestral functions have been partitioned between them, or if a novel function has been acquired 7 .
The evolutionary fate of gene copies arising from a gene duplication event has been studied in a range of species [8][9][10][11] . There are two main classes of gene duplication events: small scale duplications and whole genome duplications (WGD) 5,7,[12][13][14] . These two types of duplication event can lead to different outcomes for gene copies 13 . Whilst gene redundancy has been reported to be evolutionarily unstable 7,15 , it is frequently observed 12,[16][17][18] . A proposed driver for the retention of duplicate genes is the maintenance of gene dosage, known as the gene balance hypothesis 14,[19][20][21][22][23] . Such dosage constraints may result if the gene product acts as part of a protein complex, where an incorrect stoichiometry of proteins can lead to the appearance of deleterious phenotypes 14 . WGDs maintain the original stoichiometry, resulting in duplicated, dosage sensitive gene orthologues being retained 14,20,23 . Conversely, small scale duplication of individual genes without their partners disrupts protein stoichiometry and disfavours gene retention 19 .
Simulations of the dynamics of gene duplication events suggest that genes whose products form protein complexes, such as those associated with kinase activity, transcription, protein binding and modification, and signal transduction, are preferentially retained in the genome for longer when copied in whole genome relative to small scale duplications 19,24 . Data from a range of species are consistent with gene dosage balance [25][26][27][28][29] , including studies focusing on gene retention in the Arabidopsis genome 12,24 . In Saccharomyces cerevisiae, genes retained following a WGD are enriched for those that in diploids have haploinsufficiency or overexpression phenotypes, suggesting that the dosage of these genes is important 9 . One expectation of the gene balance hypothesis, illustrated in S. cerevisiae 20 , is that duplicated genes are more likely to be coregulated 20,23 . This co-regulation fits with the concept of buffering against stochastic effects in development 30,31 . Studying the regulation of duplicated genes can therefore provide clues for understanding their retention in the genome.
The Brassica genus contains several diploid crop species derived from ancestors that underwent a genome triplication event 5 to 28 million years ago [32][33][34] . OSR is an allopolyploid resulting from the interspecific hybridisation of two diploid species, Brassica rapa and Brassica oleracea 4 . An important agronomic trait for all Brassica crops is flowering time [35][36][37][38] , as different growing regions require varieties with very different phenologies. Flowering time has been extensively studied in the model species Arabidopsis 39-41 , revealing that flowering time genes are involved in multiple interactions and that many are transcription factors 41,42 . Thus, following the gene balance hypothesis, in a polyploid such as OSR, we would expect orthologues of Arabidopsis flowering time genes to have been preferentially retained relative to other genes in the genome, analogous to previous results that show preferential retention of genes involved with the circadian rhythm in paleopolyploid B. rapa 43 . That aspects of flowering time control are conserved between the Arabidopsis and OSR 37,44,45 makes OSR an interesting and agronomically important model to investigate the evolution of gene function following gene multiplication.
Here we show that data from a transcriptomic time series (global gene expression in the first true leaf and shoot apex prior to and during the floral transition in OSR) support the prediction of preferential retention for flowering time genes in the genome (Figure 1). Through comparative gene expression and cluster analysis we demonstrate that the regulation of many flowering time gene homologues has diverged, suggesting this may be important for their retention. As an exemplar, using knowledge of cis-regulatory elements downstream of the Arabidopsis TERMINAL FLOWER 1 (AtTFL1) gene, we identify sequence variation that correlates with regulatory differences observed for orthologues of AtTFL1 in OSR. This case study highlights the importance of homologue expression dynamics in characterising gene regulation. The differences in BnaTFL1 expression dynamics between homologues suggests that, in addition to proposed gene dosage effects, regulatory divergence may be important for gene retention.

OSR exhibits genome level expression bias across tissue types
Previous reports have demonstrated genome dominance in polyploids 46-48 . To test whether this is the case for OSR, we collected gene expression data through the vegetative to reproductive transition in a doubled haploid (DH) line derived from the spring OSR variety Westar ( Figure 2).
We compared global expression differences between the A and C genomes in the apex and the first true leaf across all time points (Figure 3; Supplementary Figure 1). We find that the A genome has a greater proportion of highly expressed genes than the C genome. Conversely, for genes showing very low expression we find the opposite relationship ( Figure 3a). Similar distributions are found but are less pronounced when only OSR genes showing sequence conservation to annotated Arabidopsis genes are considered ( Figure 3b) and when the sample is further restricted to OSR flowering time genes (Figure 3c). In contrast to the tissue-specific genome bias demonstrated in cotton 49 , our results are consistent across the two tissue types and throughout the time series (Supplementary Figure 1).
To investigate A and C genome expression at the gene level, we compared pairs of homoeologous genes that we identified using synteny and sequence similarity 4 . We classified a homoeologous pair as showing biased expression toward one genome if that gene has an expression level (measured in Fragments Per Kilobase of transcript per Million mapped reads, FPKM) at least two-fold higher than its homoeologue. At the individual gene level, biased expression was observed towards both genomes, but with 1.5 to 2.0 times as many genes showing bias towards the C rather than the A genome (16.9% towards the C genome relative to 9.7% towards the A genome in the apex, and 15.2% compared to 8.2% in the leaf; Table 1). This pattern is consistent with the findings of Chalhoub et al. (2014) and is maintained across all time points (Supplementary Table 2). The distributions of fold expression changes reveal that homoeologous gene pairs exhibiting a 2 to 8-fold change are primarily responsible for the observed bias (Supplementary Figure 2). Therefore, the homoeologue-level analyses reveal expression bias towards both the A and C genomes that are consistent across the tissue types tested and result in an absence of genome dominance (Supplementary Table 2). At the whole genome level, however, we observe a bias towards the A genome. This discrepancy may be due to genes with low expression levels tending to lack homoeologue pair information (Supplementary Figure 3). Alternatively, this bias may reflect a known higher incidence of homoeologous exchanges in which C genome copies of individual genes are replaced by their A genome counterparts 50 .

OSR expresses a higher number of flowering time gene homologues relative to the whole genome
To test the prediction that flowering time genes are preferentially retained relative to the whole genome ( Figure 1), we evaluated whether this was the case for genes expressed during the floral transition. A gene was considered to be expressed if the maximal expression level during the developmental time series was equal to or exceeded 2 FPKM, with leaf and shoot apex tested separately. We assessed the distributions of annotated ( Figure 4a) and expressed OSR flowering time genes (Figure 4b and 4c). In both leaf and shoot apex (Figure 4b and 4c), a shift towards the expression of a higher number of flowering time gene copies relative to the whole genome can be observed. To test whether this observation was caused by the retention of circadian genes, as has been reported in B. rapa 43 , we repeated this analysis after removing this set of genes and found that the pattern remained (Supplementary Figure 4). This confirms the preferential retention of flowering time genes in OSR and suggests that the multiple orthologues of Arabidopsis flowering time genes retained in the genome could be functional.

Analyses of gene expression differences reveals regulatory divergence of retained flowering time genes in OSR
Having shown that genes involved in the control of flowering time are retained as multiple homologues in the OSR genome we next investigated their regulatory control. We first examined global gene tissue specificity and found that of the 45,048 genes expressed across the developmental time series, 16% show apex specific expression and 11% show leaf specific expression, with the rest (73%) exhibiting expression in both tissues (Supplementary Figure 6).
Focussing on annotated orthologues of Arabidopsis flowering time genes, 61% have at least one orthologue in OSR that is not expressed in the apex, compared to 69% in the first true leaf ( Figure 5).
We next used Weighted Gene Co-expression Network Analysis (WGCNA) to identify regulatory modules. WGCNA uses normalised expression data to cluster genes together based on their temporal expression profiles rather than expression levels per se. We used these cluster assignments to assess the regulatory control of flowering time gene homologues. Based on the premise of tight co-regulation of dosage-sensitive or functionally redundant genes 20,31 , our null hypothesis is that all OSR orthologues of an Arabidopsis flowering time gene will have similar expression patterns, leading to orthologues being in the same regulatory module (dashed lines in Figure 6). We found that most OSR flowering time genes (74% in apex, 64% in leaf) do not conform to this null hypothesis ( Figure 6). Thus, analysis of both the overall level of expression in both leaf and shoot apex and WGCNA reveal regulatory divergence between retained homologues of flowering time genes in OSR, suggesting regulatory variation between homologues.

Self-organising map based clustering captures different patterns of regulatory divergence for OSR orthologues of the flowering time genes AtTFL1, AtFT, and AtLFY
To further assess differences in regulation between gene homologues we analysed the divergence of expression over time. Whilst WGCNA assigns expression profiles to regulatory modules, the similarity between profiles is not quantified and genes that could be assigned to multiple regulatory modules are only assigned to a single module. Furthermore, WGCNA does not account for uncertainty in the RNA-Seq data in the assignment of regulatory modules. To address these issues, we employed a self-organising map (SOM) based sampling approach to assess expression profile divergence (Supplementary Figure 8).

Patterns of intergenic sequence conservation surrounding BnaTFL1 genes provide a potential explanation for the observed regulatory divergence
Downstream regulatory sequences of AtTFL1 in Arabidopsis have been shown to be important for spatiotemporal control of expression 61 . We therefore investigated whether similar variation could explain the distinct pattern of regulation displayed by the four BnaTFL1 orthologues. We analysed sequence conservation between OSR and Arabidopsis in the 5' and 3' intergenic regions surrounding BnaTFL1, identifying several conserved regions ( Figure 8). Focussing on areas previously identified as AtTFL1 cis-regulatory elements in Arabidopsis 61 , we find variation in the degree of sequence conservation between BnaTFL1 orthologues ( Figure 8a). Sequence conservation within regions II and IV of BnaTFL1.A10 and BnaTFL1.C3 suggests Arabidopsislike cis-regulatory elements are present downstream of these genes. These BnaTFL1 orthologues, that increase in expression during the floral transition, show high sequence conservation in region II. Conversely, BnaTFL1.Cnn and BnaTFL1.C2, which are not upregulated during the floral transition, lack sequence conservation in this region. Region II was found to be necessary for the upregulation of AtTFL1 during the floral transition in Arabidopsis 61 , which correlates with this result. Region IV may also be involved in the observed expression trace divergence between BnaTFL1 homologues, as this region was found to be important for the expression of AtTFL1 in the inflorescence meristem.
Sequence conservation within region III is below 50% in BnaTFL1.Cnn, whilst for the other three homologues it is 81%, 87%, and 78% for BnaTFL1.A10, BnaTFL1.C2, and BnaTFL1.C3, respectively. Interestingly, the range of significant sequence conservation in BnaTFL1.C2 (154 bases) and BnaTFL1.A10 (162 bases) is decreased compared to that of BnaTFL1.C3 (273 bases), potentially suggesting the cis-regulatory elements in the former two copies are incomplete. 61 identified additional regions conserved across species that were not experimentally implicated in the regulatory control of AtTFL1 (green shading in Figure 8).

Serrano-Mislata et al. (2016)
We observe sequence divergence in one of these regions, region G. Interestingly it is BnaTFL.A10 and BnaTFL1.C3, which exhibit expression profiles most like that of AtTFL1, that show sequence conservation in this region. BnaTFL1.A10 exhibits high sequence conservation relative to Arabidopsis across this entire region, while BnaTFL1.C3 shows conservation over ~50% of the region. As with regions II and IV, BnaTFL1.C2 and BnaTFL1.Cnn lack conserved sequence in region G. We also identified a region of conservation not annotated in the previous analysis of AtTFL1 cis-regulatory elements. This region, situated ~600 bp upstream of the transcription start site of AtTFL1, shows ~80% sequence conservation relative to Arabidopsis in BnaTFL1.A10, BnaTFL1.C2 and BnaTFL1.Cnn. In BnaTFL1.C3, sequence conservation in this region is ~55%.
To confirm the expression differences we observe between the BnaTFL1 orthologues we performed copy-specific RT-qPCR across the developmental time series (Figure 8b). The RT-qPCR results show good correspondence with the RNA-Seq results, confirming our findings.
Thus, using sequence conservation we determine the presence/absence of cis-regulatory elements downstream of the BnaTFL1 genes that may confer similar regulatory control in OSR as in Arabidopsis. BnaTFL1 orthologues contain different combinations of cis-regulatory elements, which have the potential to underlie the divergent expression traces they exhibit. interactions, leading to neo-and subfunctionalisation. WGDs are usually followed by a process of "diploidisation" 65 that includes genome downsizing 66 , chromosome rearrangement and number reduction 67 , and gene loss 68 . So, whilst many additional gene copies gained from WGD are likely to be lost over time, the analysis of genomic sequences has revealed that a significant number of duplicated genes are nevertheless present in the genomes of many species 12,[16][17][18] . For instance, in the Arabidopsis lineage around 30% to 37% of homoeologous gene duplicates have been retained 25,69 . Based on such observations, modelling studies have determined conditions under which duplicated genes can become evolutionary stable 14,30 . These ideas have given rise to the gene balance hypothesis, which states that dosage sensitive genes are preferentially retained in the genome after WGD, but tend to be lost after local duplication events 20,23 . Kinases, transcription factors and proteins that form part of a complex fall into this category. From the gene balance hypothesis, we might therefore expect that highly networked genes such as those that regulate flowering time 40,41,70 have been preferentially retained in the genome.

Discussion
This study determines the expression profiles of OSR genes prior to and during the floral transition. We compared expression profiles across development to infer whether orthologues of Arabidopsis flowering time genes retain similar patterns of regulation. Whilst our analysis reveals that a significant proportion of duplicated genes in OSR have divergent regulation ( Figures 5 and 6, Supplementary Figures 8b and 8c), it shows that the more recently combined homoeologues are frequently found in the same regulatory module (79% in the apex and 77% in the leaf). The finding of homoeologues tending to be co-regulated in allotetraploid OSR is intriguing, given the comparatively recent origin. An analysis of 2,000 pairs of paralogous genes in Gossypium raimondii, resulting from a 5-to 6-fold ploidy increase ~60 Mya, revealed more than 92% of gene pairs exhibited expression divergence 71 Table 4). To assess biological variation, a second RNA sample for five time points in both the leaf and apex were sequenced at a lower average coverage of 33 million reads per sample. Supplementary Table 1 summarises the sampling scheme and indicates the time points for which a second pool of samples was sequenced.

Gene model prediction and read alignment
Gene models are available for the Darmor-bzh reference genome sequence 32

Weighted gene co-expression network analysis
The weighted gene co-expression network analysis was carried out using the WGCNA library 84

Self-organising maps and the identification of regulatory modules
Self-organising maps (SOM) were generated using the k o h o n e n library 85 available for the R statistical programming language 78 . As with the WGCNA analysis, the data was filtered and normalised prior to carrying out the SOM analysis. The number of nodes used in the SOM was chosen based on the ratio where N is the total number of genes, S is the total number of SOM nodes, N c is the total number of genes assigned to SOM node c, x g is the expression vector for gene g, x c is the expression vector for SOM node c, and where erf is the error function defined as is the average probability of genes ݃ ଵ and ݃ ଶ mapping to the same cluster, is the standard deviation of the probabilities calculated from the 100 different SOMs used in the sampling procedure, and θ is the tissue specific threshold. A threshold of 0.053 (apex) or 0.056 (leaf) was used. This threshold was calculated by taking the self-clustering probability that corresponded to the maximum of the density curve (Supplementary Figure 10) for each SOM and averaging them.
An automated approach was taken to quantify the pattern of clustering coefficients between copies of the same gene. Clustering coefficients were subjected to a binary filter, such that coefficients above 0.5 were set to 1 and those below set to 0. Regulatory modules were defined as groups of genes where the binary clustering coefficients between all genes were 1.

Sequence conservation analysis of orthologues of AtTFL1 in OSR
Sequence upstream and downstream of the AtTFL1 gene was extracted from the AtGDB TAIR9/10 v171 Arabidopsis genome assembly located on PlantGDB 87 and from the Darmor-bzh reference genome sequence 4 . Regions of conserved sequence were identified using mVISTA from the VISTA suite of tools 88,89 . The alignment algorithm used was AVID 90 , which performed global pair-wise alignments for all sequences. Percentage sequence conservation was calculated using a 100bp sliding window.

Quantitative PCR of BnaTFL1 homologues
Reverse transcription quantitative PCR (RT-qPCR) was carried out on copies of TFL1 using custom designed primers (Supplementary Table 3). The SuperScript® III First-Strand Synthesis System (Thermo Fisher Scientific Inc., USA) was used to generate cDNA, with 2 μ g of RNA used as input. The RNA was extracted as described above. Each RT-qPCR reaction consisted of

Competing Financial Interests
The authors declare that they have no competing interests.