Contribution of subgenomes to the transcriptome and their intertwined regulation in the allopolyploid Coffea arabica grown at contrasted temperatures

Authors


Summary

  • Polyploidy has occurred throughout the evolutionary history of plants and led to diversification and plant ecological adaptation. Functional plasticity of duplicate genes is believed to play a major role in the environmental adaptation of polyploids. In this context, we characterized genome-wide homoeologous gene expression in Coffea arabica, a recent allopolyploid combining two subgenomes that derive from two closely related diploid species, and investigated its variation in response to changing environment.
  • The transcriptome of leaves of C. arabica cultivated at different growing temperatures suitable for one or the other parental species was examined using RNA-sequencing. The relative contribution of homoeologs to gene expression was estimated for 9959 and 10 628 genes in warm and cold conditions, respectively.
  • Whatever the growing conditions, 65% of the genes showed equivalent levels of homoeologous gene expression. In 92% of the genes, relative homoeologous gene expression varied < 10% between growing temperatures.
  • The subgenome contributions to the transcriptome appeared to be only marginally altered by the different conditions (involving intertwined regulations of homeologs) suggesting that Carabica's ability to tolerate a broader range of growing temperatures than its diploid parents does not result from differential use of homoeologs.

Introduction

Polyploidy has occurred throughout the evolutionary history of eukaryotes, especially in flowering plants, which have all undergone at least one round of polyploidization (Jiao et al., 2011). Many major agricultural crop plants including wheat (Triticum aestivum), cotton (Gossypium hirsutum), sugarcane (Saccharum officinarum) and coffee (Coffea arabica) are allopolyploid, that is, they combine two or more sets of distinct genomes (homoeologous chromosomes) after hybridization of diverged genomes and genome duplication. Other crops such as potato (Solanum tuberosum) and alfalfa (Medicago sativa) are autopolyploids derived from duplication of the same genome (Chen & Ni, 2006; Jackson & Chen, 2010). Polyploidy considerably influences plant species diversity, giving rise to novel phenotypes and leading to ecological diversification and colonization of new niches (Otto & Whitton, 2000; Adams, 2007; Madlung, 2013) . Recent studies documented rapid and dynamic changes in genomic structure and gene expression in plant polyploids (Comai, 2005; Buggs et al., 2011). Genomic plasticity (Leitch & Leitch, 2008) enables polyploids to combine two divergent genomes of the parental diploid species in the same cell (Chen, 2007; Hegarty & Hiscock, 2009), whereas functional plasticity of duplicate genes enables them to regulate gene expression to adapt to their environment (Jackson & Chen, 2010).

Numerous studies have investigated variations in gene expression by comparing the expression of the diploid parents and of the allopolyploids. In the additive model, allopolyploids are supposed to display mid-parent expression value. However many exceptions have been found in Arabidopsis (Wang et al., 2006), Gossypium (Chaudhary et al., 2009; Rapp et al., 2009), Senecio (Hegarty et al., 2006, 2008) and coffee (Bardil et al., 2011), suggesting that differential regulation of gene expression is a common feature of allopolyploid plants. Changes in homoeologous expression have been investigated in one to several dozen genes in different organs of allopolyploids (Adams et al., 2003, 2004; Mochida et al., 2003; Nomura et al., 2005; Chaudhary et al., 2009; Flagel et al., 2009; Buggs et al., 2010a) and linked to different types of stress (Liu & Adams, 2007; Stamati et al., 2009; Dong & Adams, 2011; Combes et al., 2012). More recently, studies have been performed at the genome scale in a single organ (Flagel et al., 2008; Hovav et al., 2008; Buggs et al., 2010a; Vidal et al., 2010; Higgins et al., 2012; Ilut et al., 2012; Yoo et al., 2012). In allopolyploids, homoeologous genes may exhibit unequal expression patterns (i.e. homoelog expression bias; Grover et al., 2012) either locally or throughout the genome. The ratio of homoeologous expression can vary with the organ, its development, and in response to abiotic stresses. At the genome scale, differentially expressed homoeologous genes have been observed in some allopolyploids, preferentially toward one of the two subgenomes (i.e. unbalanced homoelog expression bias; Grover et al., 2012) (Buggs et al., 2010b; Vidal et al., 2010; Ilut et al., 2012). The differences observed in the studies of allopolyploids indicate that no single unifying factor governs genome-specific expression biases. Evolution of genome-specific expression occurs via a unique ad hoc mixture of genetic and epigenetic regulatory mechanisms within different species (Chaudhary et al., 2009). A multiple model approach is needed to study how allopolyploid plants respond to the merger of genomes (Hegarty, 2011).

Coffee is one of the most important agricultural commodities and ensures the livelihood of > 80 million people worldwide. Among the around one hundred Coffea species identified (Davis et al., 2006), two, C. arabica and C. canephora, are cultivated species and account for 65% and 35% of world coffee production, respectively. Coffea is a relatively recent genus estimated to have originated 0.5 million yr ago (Anthony et al., 2010). Coffea species have variable phenotypes and are able to grow in specific environments. The low sequence divergence and phylogenetic analysis of Coffea species suggest a rapid and radial mode of speciation (Lashermes et al., 1997; Cros et al., 1998; Anthony et al., 2010). All Coffea species are diploid, except C. arabica, which is allotetraploid (2n = 4 x = 44) and derived from a recent (< 50 000 yr ago) interspecific hybridization between two diploid species: C. eugenioides and C. canephora (Lashermes et al., 1999; Cenci et al., 2012). Homoeologous genomes in C. arabica are designated Ea and Ca according to their parental origin. The two parental species are closely related, and the two subgenomes have low sequence divergence (i.e. 1.3% average difference for genes, Cenci et al., 2012). The evolution of the C. arabica genome after the polyploidization event has not yet been investigated but almost perfect gene synteny between the two subgenomes has been shown for several genomic regions (Yu et al., 2011; Cenci et al., 2012). Nevertheless C. arabica displays disomic inheritance with bivalent pairing of homoeologous chromosomes (Lashermes, 2000). The two parental species exhibit different agroecological adaptations. C. canephora and C. eugenioides are endemic in regions where the annual mean temperature ranges from respectively 22 to 26°C and from 18 to 23°C with no major oscillations (DaMatta & Ramalho, 2006; Davis et al., 2006). By contrast, C. arabica, whose optimum mean annual temperature ranges from 20 to 24°C, can be grown in regions with marked variations in thermal amplitude. Studies have already been conducted to analyze the expression of genes in C. arabica in different growing conditions. First, the genome-wide expression pattern of C. arabica and its related parental species were compared and for some genes, genomic expression dominance (i.e. genomic expression level dominance; Grover et al., 2012) modulated by the growing conditions (Bardil et al., 2011) was revealed. Second, the relative expression of homoeologs of C. arabica was analyzed in 13 genes located in the same genomic region. Analysis of relative homoeologous expression revealed moderate variations across organs and growing conditions (Combes et al., 2012).

The aim of this study was to assess the contribution of subgenomes to the C. arabica transcriptome and to identify variations related to adaptation to different growing conditions. Transcriptome of leaves of C. arabica plants grown in contrasted temperature conditions was sequenced. RNA-sequencing (RNA-seq) data enabled detection of SNPs (single nucleotide polymorphisms) between the two subgenomes, and quantification of transcribed homoeolog abundance. To this end, the allopolyploid reads were first aligned against a C. canephora reference transcriptome of 56 216 unigenes resulting from deep RNA-sequencing. Then the genome origin of the allopolyploid SNPs was inferred by comparing the SNPs with available sequences of alleles of the two current diploid progenitor species. These data enabled characterization of genome-wide homoeologous gene expression in C. arabica, and investigation of the influence of growing temperatures on homoeologous gene expression. The relationships between homoeologous gene expression and total gene expression were also explored. Thanks to the recent origin of C. arabica, the relative homoeologous gene expression in the allopolyploid was then compared with differential gene expression in modern diploid progenitor species. This comparison provided information on changes in gene expression that were attributed to genome merger and caused by cis- and trans-regulatory divergences between the two parental diploid species and, hence, between the two subgenomes (Chaudhary et al., 2009; Shi et al., 2012). To our knowledge, this study of variation in homoeologous gene expression in two contrasted growing conditions is the first to be carried out at the genome scale in an allopolyploid. In addition to advancing knowledge on overall homoeologous gene expression in allopolyploids, new features connected with C. arabica's ability to adapt to environmental variations were also revealed.

Materials and Methods

Plant material

Seedlings of Coffea arabica L. var. Caturra, an inbred line, were cultivated for 2 months in two sets of contrasted growing conditions with different diurnal/nocturnal temperatures: 30°C : 26°C (warm conditions) and 23°C : 19°C (cold conditions). The conditions were chosen to suit either Coffea canephora or C. eugenioides and did not prevent normal growth of the C. arabica plants, even though the conditions were more stressful for one or the other of the parent species. A different lower temperature regime was used than in the study of Bardil et al. (2011) to suit better requirements of C. eugenioides, and consequently to compare more contrasting temperature regimes. In the two climatic chambers, the photoperiod was set at 12 h d−1, humidity at 80–90% and luminosity at 600 μmol photon m−2 s−1. In each climatic chamber, the plants were grown in a randomized complete block design. At the same time, 3 h after the beginning of the diurnal period, young, fully expanded leaves were harvested from two plants (biological replicates) in each growing chamber. Plants of C. eugenioides (genotype DA58 from the IRD collection) were cultivated in a glasshouse (26°C ± 3°C daytime and 24°C ± 3°C night-time temperatures, humidity maintained at over 60%). After collection, the samples were immediately flash frozen in liquid nitrogen and stored at −80°C until RNA extraction.

RNA sequencing

Total RNA was isolated from 1 g of material from each plant. Harvested samples were ground in liquid nitrogen and the powders suspended in 20 ml of an extraction buffer (5 M guanidinium isothiocyanate, 31 mM sodium acetate (pH 8), 1% β-mercaptoethanol, 0.88% (w/v) N-lauroyl sarcosine and 1% (w/v) polyvinylpyrrolidone 40). The solutions were centrifuged at 15 000 g for 20 min at 4°C. RNA was purified on a 5.7 M cesium chloride layer (18 000 g for 20 h at 20°C). The RNA pellets were rinsed twice with 70% (v/v) ethanol and dissolved in 100 μl of RNAse-free water. The quality and the concentration of extracted RNA samples were determined using the Agilent DNA1000 (Agilent, Santa Clara, CA, USA).

The five mRNA libraries were built and sequenced at the MGX platform (Montpellier Genomix, Institut de Génomique Fonctionnelle, Montpellier France). The RNA libraries were constructed using the ‘RNA-seq sample prep’ kit (Illumina, San Diego, CA, USA). RNA was sequenced using the Illumina SBS (sequence by synthesis) technique on a Hiseq 2000 (Illumina). Image analysis, base calling and quality filtering were performed using Illumina software. For the C. eugenioides library, 32 million reads were obtained, while for the four C. arabica libraries, a total of 326 million reads (ranging from 64 to 88 million reads per library) was retained.

RNA-seq data processing

For each sample, the 72-nt single reads were aligned using BWA (Li & Durbin, 2009) against a C. canephora expressed sequence tag (EST) assembly (with 56 216 unigenes) as transcriptome reference. This reference base was built with transcripts from ‘SGN coffee unigene build III’ (16 046 clusters, SOL), ‘Brazilian Coffee Initiative’ (16 801 clusters) (Vidal et al., 2010) and ‘French C. canephora sequencing consortium’ (52 683 clusters). Two former resources are the result of Sanger sequencing (c. 78 470 ESTs from various tissues), the latter of deep Illumina sequencing (c. 118 million 76 nt reads from total RNA isolated from leaf, stem and flower tissues). These three resources were assembled using CAP3 software (http://seq.cs.iastate.edu/) with default parameters. Redundancy among contigs was investigated using BLAST (i.e. all-against-all sequence comparison). Duplicates (i.e. > 95% identity on > 97% of sequence length) were removed, and only the longest sequences kept. A total of 56 216 nonredundant clusters with an average length of 663 bp were retained. A maximum of four mismatched nucleotides (including gaps) between the read and the reference transcriptome sequence was allowed. The reads were also mapped to a C. eugenioides transcriptome reference (a subset of 8351 unigenes) obtained from the C. eugenioides transcriptome sequenced by Illumina technology and mapped to the C. canephora transcriptome reference.

Data analysis

From the selection of the unigenes up to the data analysis, all statistical analyses and data processing were performed with R software.

Estimation of transcript level

Mapped sequence counts were processed using the ‘DESeq’ package (Anders & Huber, 2010) to estimate the transcript level. This method is particularly suitable for experiments with biological replicates. The count data were normalized on the total number of counts taking the variance and the mean of the biological replicates into account. For each sample, the normalized expression level for each unigene of the reference transcriptome was expressed as reads per kilobase of transcript sequence per million reads (RPKM) (Mortazavi et al., 2008). For each unigene, the fold change of the gene expression between the both growing conditions was analyzed and the statistical significance estimated using an adjusted p-value for multiple testing (Benjamini–Hochberg method).

Estimation of homoeologous gene expression

Different categories of SNPs can be found in allopolyploids that present a high level of genome complexity, including sequence variations between subgenomes (homoeologous-SNPs) that co-exist with allelic variations within subgenomes (homologous-SNPs) (Kaur et al., 2012). For SNP discovery, the aligned allopolyploid reads were analyzed with the GATK toolkit (McKenna et al., 2010) using the Unified Genotyper module to obtain SNPs list and allelic data, and the Depth Of Coverage module to obtain depth coverage information. To infer the SNP genome origin and distinguish homoeologous SNPs, the detected allopolyploid SNPs were compared to the corresponding nucleotides in both modern parental diploid genomes, C. canephora and C. eugenioides, using SNiPloid, a newly developed tool (M. Peralta et al., unpublished, http://sniplay.cirad.fr/cgi-bin/sniploid.cgi). Notably, SNiPloid allows the user to set a minimum depth coverage to consider a sequence position. Minimum depth coverages of 20 and 4 were required for the samples of C. arabica and C. eugenioides, respectively. A read cutoff of 20 was selected to account for the possibility of a bias in homoeolog expression in the allopolyploid.

For each sample, homoeologous SNP sites were used to calculate a site homoeologous expression ratio. For each unigene, a weighted mean of site homoeologous expression ratios was calculated. Unigenes containing only one homoeologous SNP site, or with different site homoeologous expression ratios (prop.test with Bonferroni correction, P < 0.001) were discarded. Finally the biological replicates were combined by averaging the weighted means of each unigene. Unigenes displaying > 20% of variation between the two homoeologous gene expression estimations, or < 80 total reads per unigene were discarded. The parameters were selected to ensure a reliable estimate of homoeologous gene expression. The data and the unigenes of the warmer and the cooler conditions were treated using the same method. Homoeologous gene expression was deduced from the results by considering unigenes as equivalent to genes. The homoeologous gene expression corresponding to homoeolog-specific read counts of the total read counts was expressed as the percentage of the Ca homoeolog in the total gene expression.

Comparison of subgenome and genome specific expression between the allopolyploid and its diploid progenitor species

By comparing subgenome-specific expression in the allopolyploid with the genome-specific expression in the diploids, using the method applied by Wittkopp et al. (2004) to alleles of interspecific hybrids in the group of Drosophila species, it is possible to separate expression variation into cis- and trans-origin. Even though we had no synthetic F1 hybrids of Coffea diploid species, these analyses were possible because a slight sequence divergence was observed between the two constitutive genomes of C. arabica and those of its modern parental species (Lashermes et al., 1999, 2010; Cenci et al., 2012). In theory, differences in genome-specific expression due to cis-regulatory divergences between the diploids and shared by the allopolyploid will fall along a 1 : 1 diagonal, whereas, when present in the allopolyploid, trans-regulatory divergences will balance genomic-specific expression and instead fall along on equivalently expressed horizontal line. Genes that do not fall along either of these lines are affected by both cis- and trans-regulatory divergences in the allopolyploid and are assumed to be modified by a combination of changes in cis- and trans-factors. Microarray data from C. canephora and C. eugenioides obtained in warmer growing conditions (Bardil et al., 2011) were used to apply this method to C. arabica. Microarray and RNA-seq data were found to be highly correlated (r = 0.63 < 0.001) for 2900 unigenes in the two studies. A set of 1434 unigenes displaying significant difference in expression between the two diploid species (t-test, H0: C/E = 1, P < 0.01) were selected. The expression of these genes was analyzed using statistical tests: two-sided t-test (H0: Ca/Ea = 1, P < 0.01) and two-sided Wilcoxon test (H0: C/E = Ca/Ea, P < 0.01) to identify any cis- and trans-regulatory changes that occurred in the allopolyploid.

In order to improve our knowledge of the relationships between gene expression and homoeologous gene expression in an allopolyploid, we compared the mean of relative homoeologous gene expression of genes showing expression level dominance (i.e. that expressed similarity with one of the two diploids, and were up- or downregulated with respect to the other parent, 891 genes common to the current study and the study of Bardil et al. (2011) to the mean of relative homoeologous gene expression of all the genes studied (t-test, P < 0.01). Likewise, to test if the expression level dominance characteristic of these genes is linked to a particular type of regulatory divergence between diploid species, the distribution of gene expression divergence (cis, trans, cis + trans combination) was compared to the distribution of gene expression divergence of the whole set of genes.

Gene ontology functional enrichment analysis

Computational mapping and Plant GO-slim annotation were performed using Blast2GO software v2.6.4 (Conesa et al., 2005; Conesa & Götz, 2008) with default parameters of 12 060 unigene sequences for which the relative expression of homoeologs was estimated either in the warmer or in the cooler conditions. For the genes selected at different steps of the analysis, molecular and cellular functions were catalogued using enrichment tests (FDR < 0.05) to evaluate statistical significance.

Results

Identification of homoeologous SNPs by RNA-seq

Among all the SNPs detected (from 218 538 to 271 947 SNPs depending on the sample concerned) a total of 82 844 and 91 528 (77 920 found in both growing conditions) were considered as homoeologous SNPs for the warmer and cooler conditions, respectively. Following the selection steps described in the material and methods section, 9959 and 10 628 genes were analyzed for homoeologous gene expression analysis in the warmer and colder growing conditions, respectively; including 8527 genes found in both growing conditions.

Homoeologous gene expression throughout the genome

The primary goal of this study was to investigate homoeologous gene expression among plant samples cultivated in two different growing conditions. For each of the two growing conditions, the genes were classified as the percentage of the Ca homoeolog relative to total gene expression in ten categories ranging from 0% to 100% (Fig. 1). Relative homoeologous gene expression was evaluated with an average confidence interval length of 8.5% at the 95% confidence level. The distribution patterns of genes in different categories were similar in the two growing conditions (χ2 test = 10.5217, df = 7, P = 0.1609). The proportion of genes exhibiting a bias whatever the direction of the bias was almost the same: 37.5% and 34% for the warmer and cooler conditions, respectively (prop.test with Bonferroni correction, H0: Ca/Ea = 1, < 0.001). A very low percentage of genes (0.02% for the warmer temperature and 0.04% for the cooler temperature) showed extremely biased homoeologous gene expression (categories: 0–10% and 90–100% of Ca) toward one subgenome. The average percentage of the Ca homoeolog was 52.5% and 52.0% for the warmer and cooler temperatures, respectively.

Figure 1.

Relative homoeologous gene expression in Coffea arabica in two different growing conditions (warm, light grey bars; cold, dark grey bars). C. arabica genes were classified according to the relative expression of their homoeologs represented by the contribution of subgenome Ca to total gene expression (%Ca). The exact number of genes in each category is indicated at the top of each column.

Comparing the symmetric categories of the distribution revealed that, whatever the growing conditions, more genes showed biased homoeologous gene expression toward Ca than toward Ea. A mapping-related technical bias was investigated; the analysis of the reads of C. arabica cultivated in the cooler condition was repeated using a C. eugenioides reference transcriptome instead of a C. canephora reference transcriptome. The nature of the transcriptome reference used for mapping induced a slight technical bias (Supporting Information Fig. S1); however, the two relative homoeologous gene expression estimates were comparable and highly correlated (Fig. S2).

Gene ontology enrichment analysis was performed on genes showing biased and extremely biased homoeologous gene expression toward Ea or Ca. This analysis provided little information: a few GO terms showing statistically significant enrichment were revealed (Table S1); however, GO terms linked to ‘enzymatic activity’ were over-represented. whereas genes with GO terms linked to ‘binding functions’ were under-represented.

Homoeologous gene expression in response to different growing conditions

Variation in homoeologous gene expression between the two growing conditions

For each of the 8527 genes analyzed in both growing conditions, the relative homoeologous gene expression in the warmer condition was compared to the relative homoeologous gene expression in the cooler condition. Figure 2 shows the distribution of the variation in the observed relative expression of the homoeologs in the two growing conditions whatever the direction of the variation. In 92% of the genes, homoeologous gene expression varied < 10% between the two growing conditions; of the rest, variation was between 10% and 20% in 7.6% of genes and > 20% in 0.4% of the genes. For the last two subsets of genes, shifts in the relative homoeologous gene expression favored the Ca or Ea homoeolog in 3.9% and 3.7% of genes, respectively. Homoeologous gene expression between the two growing conditions was highly correlated (r = 0.87 with P < 0.001) (Fig. 3).

Figure 2.

Variation in the relative homoeologous gene expression in Coffea arabica in the two growing conditions (warm and cold). The genes were sorted by differences in relative homoeologous gene expression whatever the direction of the variation. The exact number of genes in each category is indicated at the top of each column.

Figure 3.

Correlation between the relative homoeologous gene expression measured in Coffea arabica in the two growing conditions (warm and cold).

Gene ontology enrichment analysis was performed on genes showing > 10% of variation in relative homoeologous gene expression between the two growing conditions. This analysis revealed GO terms showing statistically significant enrichment. As in the previous analysis, genes described by GO terms linked to ‘enzymatic activity’ were over-represented, while genes described by GO terms linked to ‘binding functions’ were under-represented (Table S2). Gene ontology enrichment analyses performed on genes whose relative homoeologous gene expression varied either toward Ca or Ea did not reveal statistically significant Go term enrichment.

Relationships between variations in relative homoeologous gene expression and total gene expression

To look for a relationship between homoeologous gene expression and total gene expression, homoeologous gene expression and gene expression were investigated in 9959 and 10 628 genes in warmer and cooler growing conditions, respectively (Fig. S3). Whatever the growing conditions, no correlation was found between these two variables (for the warmer condition: r = −0.01, P = 0.2988, for the cooler condition: r = −0.02, P = 0.03304). In addition, variations in the level of gene expression (log2 fold change between the ratio of total expression in the warmer and cooler growing conditions) and of homoeologous gene expression between the same growing conditions (%Ca in the warmer condition vs %Ca in the cooler condition) were compared. A very low correlation coefficient (= 0.06, < 0.001) was observed, suggesting that the variation in total gene expression is not dependent on the variation in relative homoeologous gene expression (Fig. 4). No statistically significant GO term enrichment was revealed by gene ontology enrichment analysis performed on subsets of genes showing variations in the level gene expression and variations in the relative homoeologous gene expression.

Figure 4.

Relationships between the relative homoeologous gene expression and total expression variations observed in Coffea arabica in two growing conditions. For each C. arabica gene studied in two growing conditions, the difference in the relative homoeologous gene expression (%Ca in warmer condition – %Ca in cooler condition) is plotted against the change in total expression (Log2 expression in colder condition /expression in warmer condition) (= 0.06, P-value = 2.135 e−07).

Homoeologous gene expression and gene expression in the diploid progenitor species

Cis- and trans-regulatory homoeologous gene expression

In 1434 genes, we examined the cis- and trans-acting regulatory changes by plotting the ratio of gene expression of the diploid species (C/E) against the relative expression of homoeologous genes in the allopolyploid (Ca/Ea) (Fig. 5). Amongst these genes, 32% and 14% showed cis-regulatory and trans-regulatory divergences, respectively, that occurred at a time between the two diploid species and the two subgenomes. Finally, 54% were affected by a combination of cis- or trans-regulatory divergences that occurred at a time between diploid species and both subgenomes. The last group was composed of 21% of genes showing a combination of cis- and trans-regulation divergences and 33% of genes for which it was difficult to statistically disentangle the cis- and trans-regulatory divergences.

Figure 5.

Comparison of the relative homoeologous gene expression in Coffea arabica with the gene expression ratio between its diploid progenitor species. For differentially expressed genes of the diploid species, the ratio of gene expression of the diploid species (C/E) is plotted against the relative expression of homoeologous genes in the allopolyploid (Ca/Ea) on a logarithmic scale. (a) All genes studied are represented irrespective of the regulatory changes. Genes showing cis-regulatory divergences, trans-regulatory divergences, undetermined regulatory differences, and a combination of cis and trans-regulatory divergences are presented in (b), (c), (d), (e), respectively. The diagonal lines indicate 100% cis-regulatory divergence; the horizontal lines indicate 100% trans-regulatory divergence.

Homoeologous gene expression and patterns of differential expression between the allopolyploid and the diploid parental species

The average percentage of the Ca homoeolog in the subset of genes showing expression level dominance like the diploid species C. eugenioides was 50%, not significantly lower (t = −1.7115, df = 57.632, P = 0.09237) than 52%, the average percentage of homoeologous Ca of all the genes studied. In the subset of genes showing expression level dominance like the diploid species C. canephora, the average percentage of the Ca homoeolog was 56%, significantly higher (t = 8.0928, df = 523.093, P = 0.001) than the average percentage of homoeologous Ca of all the genes studied. Genes in the expression level dominance category did not present a particular type of regulation divergence (χ2 test = 6.8686, df = 3, P = 0.0762).

Discussion

Homoeologous gene expression throughout the genome

Even though RNA-seq analysis of polyploid plant transcriptomes is challenging, both the genome and the gene expression of polyploids have been successfully analyzed using strategies based on sequence similarity to determine the origin of the transcript (Buggs et al., 2010b, 2012; Bombarely et al., 2012; Higgins et al., 2012; Ilut et al., 2012). Here, we report results of genome-wide homoeologous gene expression analysis of allopolyploid C. arabica plants in two contrasting growing conditions. First, the sequences were mapped using a transcriptome reference from one of its diploid progenitors; then, homoeologous SNPs were identified by comparing allopolyploid SNPs with the allele sequences of its modern diploid parental species. Due to the use of a diploid transcriptome reference, this method was shown to induce a technical bias, but one which appeared to have no consequences for the relative homoeologous gene expression estimates. For the first time, relative homoeologous gene expression was assessed in 9959 pairs of homoeologous genes in warm growing conditions (day : night temperatures: 30°C : 26°C) and in 10 628 pairs of homoeologous genes in cold growing conditions (day : night temperatures: 23°C : 19°C). An earlier study analyzed homoeologous gene expression in 2646 contigs of C. arabica by combining EST data from various libraries produced from seeds/berries, leaves and flowers in uncontrolled growing conditions (Vidal et al., 2010). However in most cases, the number of ESTs that cover homoeologous genes was too low to reliably assess relative homoeologous gene expression.

The present characterization of the genome-wide homoeologous gene expression of C. arabica complements previous studies performed in different allopolyploids (Buggs et al., 2010b; Bombarely et al., 2012; Higgins et al., 2012; Ilut et al., 2012). Whatever the growing conditions, 65% of genes studied here showed an equivalent level of homoeologous gene expression, and 35% showed biased homoeologous expression. Taking into consideration possible variations due to different methodological parameters between studies, the proportion of genes exhibiting biased homoeologous gene expression is similar to that previously reported (i.e. 22%) for C. arabica by Vidal et al. (2010), as well as to that estimated for several natural allopolyploids: 25% in Gossypium hirsutum (Adams et al., 2003), 27% in Nicotiana tabacum (Bombarely et al., 2012) and 22% in Tragopogon miscellus (Buggs et al., 2010b). But these estimates are higher than the 15% observed in synthetic Brassica napus allopolyploids (Higgins et al., 2012). This discrepancy could be related to an increase in homoeologous expression bias over evolutionary time as reported by Flagel & Wendel (2010). In some allopolyploids, a subgenome can contribute preferentially to the transcriptome of the allopolyploid (i.e. unbalanced homoelog expression bias). In Glycine dolichocarpa, Ilut et al. (2012) showed that the same subgenome was mostly affected by unbalanced homoeologous gene expression. In Tragopogon miscellus analyzed using a method based on a diploid transcriptome reference, a similar unbalanced homoeolog expression was reported; however, this result should be interpreted with caution due to a possible mapping-related technical bias, as mentioned by the authors (Buggs et al., 2010b). In C. arabica, neither of the two subgenomes was preferentially expressed. In both growing conditions, the mean relative homoeologous gene expression was almost 50%.

Gene ontology enrichment analysis of biased homoeologous genes did not distinguish genes with specific functions preferentially expressed by one or other of the two subgenomes. However, when all genes exhibiting biased expression were considered together, gene ontology enrichment analysis did reveal slight over-representation of genes linked to ‘enzymatic activity’ and slight under-representation of genes linked to ‘binding functions’. In C. arabica, a recent allopolyploid, observation of biased homoeologous gene expression may indicate a first step in an evolutionary process such as subfunctionalization. Indeed functional diversification of duplicated genes may be a major feature of the long-term evolution of polyploids (Blanc & Wolfe, 2004). Even if most duplicate genes are rapidly lost at an evolutionary timescale (Lynch & Conery, 2000), some homoeologous genes are retained depending on the dosage sensitivity of the gene and the gene function (Birchler et al., 2005; Freeling & Thomas, 2006; Birchler & Veitia, 2007; Veitia et al., 2008). In Arabidopsis thaliana, the transcription factors and signal transduction genes have been preferentially retained, which is not the case of other genes with enzymatic functions (Blanc & Wolfe, 2004).

Homoeologous gene expression regulation

In a new allopolyploid, both the level and the pattern of gene expression are affected by the merging of two divergent genomes of the parental diploid species in the same cell (Osborn et al., 2003; Chen & Ni, 2006; Chen, 2007). Differences in gene expression between homoeologs can result from cis- and trans-regulatory changes. Several cis- and trans-regulatory models of homoeologous gene expression in allopolyploids have been proposed (Riddle & Birchler, 2003; Pignatta & Comai, 2009). Cis- and trans-regulatory changes have been distinguished in C. arabica, as well as in cotton by Chaudhary (Chaudhary et al., 2009), and in Arabidopsis by Shi et al. (2012). Whereas in cotton and Arabidopsis, predominant cis-regulatory effects mediating gene-expression divergence were observed, in C. arabica, the differential expression between homoeologous genes are explained exclusively by cis-regulatory differences in less than half the genes studied. In the remaining genes, the differential expression between homoeologous genes is explained by a combination of cis-regulatory effects and trans-regulatory effects. Therefore, because the two subgenomes of C. arabica display low sequence divergence and the variations in homoeologous gene expression are limited and consistent across growing conditions, it is likely that homoeologous trans-regulatory factors interact and control the regulation of the transcription of the two subgenomes. For most genes, overall expression appeared to be the result of two homoeolog expressions regulated by intertwined mechanisms. This high proportion of combined cis- and trans-regulatory divergences between diploid species and between the two subgenomes observed in C. arabica is congruent with the rapid radial speciation of Coffea species (Cros et al., 1998; Anthony et al., 2010). During adaptative radiation, rapid morphological diversification may be accompanied by accelerated rates of evolution of regulatory genes (Barrier et al., 2001) in which trans-regulatory divergences between diploid species are more often observed (Wittkopp et al., 2004).

In allopolyploids preferentially expressing one of the two subgenomes (i.e. unbalanced homoeolog expression bias), such as Gossypium allopolyploids and Glycine dolichocarpa, regulation models have been proposed which suggest that cis- and trans-regulatory divergences between subgenomes may play a role in the transgressive expression detected in allopolyploids in comparison with diploids (the level of expression in allopolyploids is higher or lower than in diploids; Rapp et al., 2009). In studies comparing Gossypium allopolyploids and diploid F1 hybrids with their diploid progenitors, it was also hypothesized that the genome preferentially expressed in allopolyploids is causally connected with the unbalanced expression level dominance of one parental species after genome merger, (Flagel & Wendel, 2010; Yoo et al., 2012). In C. arabica, for genes previously classified in the expression level dominance category by a microarray study (Bardil et al., 2011), we found no relationship between this characteristic and biased homoeologous gene expression or specific variations in cis- or trans-regulatory effects, perhaps because of intertwined regulation mechanisms between homoeologous genes.

Homoeologous gene expression in response to different growing conditions

Even though our understanding of gene expression in polyploids has increased in the last few years, the mechanisms that lead to increased responsiveness of polyploids in stressful environments remain to be investigated. The application of abiotic stress considerably impacts homoeologous gene expression in polyploids (Liu & Adams, 2007; Dong & Adams, 2011) The homoeologs in the allopolyploid could be used differentially for different or variable responses to an array of stressful conditions (Yoo et al., 2012; Madlung, 2013). The present work was a whole genome study of the variation in homoeologous gene expression in C. arabica in contrasted growing conditions that involve adaptation constraints for C. arabica. Unlike previously cited studies, neither growing conditions disrupted the relative homoeologous gene expression or produced harmful effects. Whatever the growing conditions, neither of the two subgenomes was preferentially expressed and no cases of gene silencing were observed, with the caveat that in our analytical conditions, only gene silencing that depended on growing conditions would be revealed. We did not observe relative homoeologous gene expression biased toward Ca in the warmer condition or toward Ea in the cooler condition. Furthermore, using gene ontology enrichment analysis, it was not possible to differentiate groups of genes showing a variation in biased homoeologous gene expression toward Ea or toward Ca. At the genome-wide scale, C. arabica's ability to tolerate a broader temperature range than its diploid parents appears not to be related to the overall higher expression of the homoeologous genes deriving from the parental species best suited to one or the other temperature. Furthermore, for most of the genes, no relationship appears to link variation in total gene expression and variation in relative homoeologous gene expression. Whereas total gene expression changed, almost no change in relative homoeologous gene expression was observed. Given the intertwined regulation mechanisms between homoeologous genes, this outcome underlines the complexity of the regulation of homoeologous genes in a doubled network of interconnected genes and regulatory elements. To further advance our understanding of the adaptive abilities of polyploids, regulatory mechanisms of genes that simultaneously exhibit variation in total gene expression and in relative homoeologous gene expression could be investigated, and the relative homoeologous gene expression of specific genes on adaptation could be finely observed using ecological and controlled approaches. The analysis of post-transcriptional events can also be envisaged, as studies suggest that mRNA stability may be involved in the adaptation of polyploids to stress (Kim & Chen, 2011).

Taken together, the main results of the present study compared to those of previous studies on allopolyploids underline the particularities of C. arabica, a recent allopolyploid between two closely related diploid species. This outcome illustrates, on the one hand, the importance of considering the time of allopolyploidization and the evolution of progenitor species when interpreting the results of analyses; and, on the other, the need to examine multiple allopolyploid species in order to fully understand the mechanisms that lead to the adaptive abilities of allopolyploids.

Ancillary