Natural genetic variation impacts expression levels of coding, non-coding, and antisense transcripts in fission yeast

Our current understanding of how natural genetic variation affects gene expression beyond well-annotated coding genes is still limited. The use of deep sequencing technologies for the study of expression quantitative trait loci (eQTLs) has the potential to close this gap. Here, we generated the first recombinant strain library for fission yeast and conducted an RNA-seq-based QTL study of the coding, non-coding, and antisense transcriptomes. We show that the frequency of distal effects (trans-eQTLs) greatly exceeds the number of local effects (cis-eQTLs) and that non-coding RNAs are as likely to be affected by eQTLs as protein-coding RNAs. We identified a genetic variation of swc5 that modifies the levels of 871 RNAs, with effects on both sense and antisense transcription, and show that this effect most likely goes through a compromised deposition of the histone variant H2A.Z. The strains, methods, and datasets generated here provide a rich resource for future studies.

I.
Accuracy of the genotyping Because RNA-seq only allows the sequencing of expressed genomic regions, only half of the polymorphic sites could be directly genotyped; the genotypes of the other site were inferred. To infer the missing values, we considered the flanking polymorphisms. Flanking markers were also used to filter out potential genotyping errors. If two flanking markers indicate inheritance from the same parent it is likely that also the intermediate genomic region was inherited from the same parent. A deviation from this assumption is possible if either i) two crossovers have occurred in between the two informative flanking sites (on both sides of the polymorphic site of interest), or ii) a non-crossover is implicated (non-crossovers can lead to small haplotype blocks), or iii) complex gene conversion events took place (Mancera et al, 2008), or iv) it results from genotyping errors. If the flanking markers are close, such patterns are more likely to denote an erroneous genotype. This pattern with flanking markers distant of less than 50 kb, appeared 34 times when considering all the strains of the library. Interestingly, we observed occurrences at a similar frequency in the progenitor strains, for which the genotype is obviously known. This suggests that most of the 34 observations probably correspond to genotyping errors. Moreover, we estimated the probability of a double recombination in a 50 kb interval as follows. There are approximately 30-50 crossovers per meiosis in S.pombe and fewer non-crossovers (Egel, 2004). In addition, when observing only one segregant of a meiotic tetrad, only one crossover out of two results in a recombination. Considering the above,, we based our estimation of the probability of double-recombinations on 100 recombinations per segregant, which should be an over-estimation (even for F2). We then considered that recombinations follow a Binomial distribution with p being the probability of recombination at each base (100 divided by the genome size), and the number of trials n set to 50,000. From that we calculated that the probability of observing two or more recombinations in 50kb was 0.0078. Thus, altogether it seems plausible that alternating genotypes for marker intervals less than 50kb represent genotyping errors (see Materials and Methods).
The same reasoning was applied to the imputation of missing genotypes. When the flanking polymorphisms show the same segregation pattern and are close enough, we assumed that no recombination event took place between them to infer the missing genotype. This strategy may lead to few missed small recombinations, non-crossovers or complex conversion tracts. However, the comparison of the expression levels obtained when aligning the RNA-seq reads on the strain specific genome (SSG) vs reference genome (RG) suggests that the imputation of the missing genotypes is almost entirely correct. Indeed, more reads were mapped on the SSG for 98.3% of the genes that showed differences of expression quantification in this comparison. It implies that the genotyping was correct for a large fraction of this subset of genes. This argument is even stronger since we showed many of the cases where expression appeared higher with reference genome mapping actually correspond to technical biases due to sequence homology (see below).
Note that, except rare cases (like (Bloom et al, 2013), genotyping of recombinant inbred populations for QTL studies has been carried out using microarrays (Atwell et al, 2010;Brem et al, 2002;Sinha et al, 2008) or microsatellite analyses (Williams et al, 2001), which led to a much lower genotyping precision than the one we achieved-here we directly sequenced almost half of the existing polymorphisms.

B. Genotyping and sequencing depth
When direct DNA sequencing is used to genotype an individual, the deeper the sequencing coverage, the more loci can directly be genotyped. However, this does not completely apply to RNA-seq based genotyping.
When RNA is sequenced, variant genotypes can only be called in transcribed regions. Although most of the fission yeast genome is transcribed, only a fraction of it is sufficiently expressed to reliably call the genotype.
In order to evaluate the influence of the sequencing depth on the genotyping, we have under-sampled the RNA-seq data from few samples at different scale. We then applied our complete analytic pipeline to the subsets to test the efficiency and the accuracy of the genotyping. Although the genotyping efficiency (i.e., the number of polymorphic site that can be genotyped) is affected by the sequencing depth (Supplementary Figure S19A), the accuracy of the genotype is good even at very low coverage Supplementary Figure S19B).
The genotyping capacity of the proposed method reaches relatively quickly saturation at an effective depth of ~20x (here, we defined the effective sequencing depth as being the number of bases uniquely mapped scaled to the genome size). Hence, the sequencing depth achieved in our study (37.5x, on average) is sufficient for genotyping. It can be seen as limiting factor to apply the method in organisms that have a bigger genome. The sequencing depth required for accurate transcript quantification (Tarazona et al, 2011;Toung et al, 2011) may actually be sufficient for genotyping also in other species with larger genomes.
Moreover, ~40% of the polymorphic sites that could not be directly genotyped correspond either to calls, which quality score felt below our threshold, or to ambiguous calls (heterozygous calls, see Methods). Such calls were not taken into account in our analysis because i) we favored accuracy over efficiency, and ii) they were not needed to successfully genotype the recombinant strains: the considerable sequencing depth of this dataset has led to saturation for almost all samples (Supplementary Figure S19B). There is therefore room for improving the direct genotyping efficiency by relaxing the threshold, using a genotype caller optimized for haploid genomes (note that the latest version of GATK (DePristo et al, 2011) implements this possibility), or by considering the genotype likelihoods of the different alleles of each variant to make dedicated genotype calls (a method close to this one using hidden Markov models has been proposed in (Bloom et al, 2013) for genotyping via pooled DNA sequencing). Such developments may reduce the required sequencing depth for RNA-seq based genotyping.
The RNA-seq based genotyping method that we propose requires that the haplotype blocks contain sufficient polymorphic expressed genes, and well characterized progenitors. Whereas the recombinant cross of interest is haploid or inbred, we think that this method is usable with limited adaptation. In the case of recombinant outbred lines of diploid organisms, allele specific expression and haplotype phasing make the RNA-seq based genotyping more complicated and would require substantial additional development.

II. Accounting for individual genomes improves transcript quantification
To assess whether sequence variations affect expression quantification in recombinant libraries and eQTL detection, we compared alignments against the reference genome (RG) vs alignments against an individualized, strain-specific genome (SSG). As expected, more reads could be mapped to the SSG than to the RG. Nine percent of all transcript quantifications (all strains and all traits) were differentially quantified (at least one read difference), and 1,756 traits (27.2%) were differentially quantified in at least one sample.
Only genes in which polymorphisms had been identified were expected to be affected by this bias. Indeed most genes containing a polymorphism (92.6%) were differentially quantified in at least one sample ( Figure   4a). Consistent with our expectations, most of the affected measurements (98.3%) were increased for the SSG compared to the RG mapping (Figure 4b). A deeper analysis confirmed that mapping differences were mostly caused by reads that could not be mapped to the RG (Supplementary Figure S8A).
In a few measurements (1.7%), we observed reduced expression levels for the SSG alignments. These results were not expected, but could be explained by two reasons. First, they could correspond to genotyping errors or uncertainties. If a polymorphism was erroneously called as inherited from Y0036, mapping on the RGwhich corresponds to 968 genome-would result in a greater quantification (therefore lower on SSG). The second explanation is more subtle: during the alignment we required that reads be unambiguously mapped to one region. I.e. all reads mapping to more than one region were excluded from the quantification. In some of the cases with lower expression levels, we observed when mapping to the SSG that the respective locus in the SSG was more homologous to another genomic region than the same locus in the RG. Hence, reads mapping to the respective locus in the SSG were more likely to also map to the second region and therefore got excluded from the transcript quantification (Supplementary Figure S8B-C).
The magnitude of quantification differences depended on the nature of polymorphisms (indels causing a stronger bias than SNPs, Supplementary Figure S8D) and on the density of polymorphisms (Spearman's rank correlation ρ=0.9, p<10 -15 , Supplementary Figure S8E). Nevertheless, the quantification differences between RG and SSG mapping exceeded 10% for only 0.49% of the measurements (all strains and all traits considered).
Next, we evaluated the impact of this distortion on cis-eQTL detection. We compared all cis-linkages of the 1,756 genes for which a difference in quantification was observed. Cis-linkages were more significant for RG than for SSG mapping (p-value <10 -9 , Wilcoxon's signed-rank test). Moreover, when mapping to the RG, we detected 12.8% more cis-eQTLs than when mapping to the SSG (53 versus 47; FDR <5%). Only few genes were affected in our study, because of the small genetic divergence between the progenitor strains (Discussion).

III. Chromosome I inversion
A pericentric inversion of a 2.2Mb region on Chromosome I (Brown et al, 2011) was shared by the progenitor strain 968 but not by the other parental strain Y0036. Despite this inversion, successful meiosis and recombination took place between those two lines. We however observed an important reduction of the crossover frequency in this region. Indeed, in a majority of the segregants (32 strains, 73), no recombination happened in this region (Figure 2, Supplementary Figure S6).

Meiotic segregation and recombination in heterozygous carriers of pericentric inversions have been
reported and studied for decades (Martin, 1991). They can lead to the formation of unbalanced (duplications or deficiencies) or balanced chromosomes (reviewed in (Anton et al, 2005;Morel et al, 2007). For instance in human, on the one hand several cases of cri du chat syndrome originating for chromosome 5 pericentric inversions have been reported (Dobbs et al, 1988;Levy et al, 2002), non exhaustive list), on the other hand pericentric inversion, notably of Chromosome 9, are found with a frequency of 1-2%, and are considered as polymorphisms (Anton et al, 2005;de la Chapelle et al, 1974;Kaiser, 1984). Sperm studies in heterozygote inversion carriers reported a reduction of the rate of meiotic recombination (Morel et al, 2007), as we observed in this recombinant library.
In cases of heterozygous pericentric inversion, meiotic crossovers require the formation of an inversion loop (twisting and folding of the inverted segment) during the synapsis of the homologous chromosomes (meiosis pachytene stage) (Anton et al, 2005). For balanced chromosomes to arise none or an even number of recombinations have to take place in the loop; otherwise, it leads to duplication or deficiencies (Supplementary Figure S7). Interestingly, only even numbers of recombinations were observed within the inverted region (Figure 2), and we did not find any unbalanced Chromosome I in the library. This suggests that the other recombination schemes, leading to unbalanced chromosomes were non-viable spores. We noticed that the markers corresponding to the extremities of the inversion showed identical inheritance profiles in the segregants. As can be seen from Supplementary Figure S8D the balanced chromosomes can only be obtained if the 3' and 5' distal parts of the inverted region always originated from the same chromatid-and are therefore inherited from the same parents.
However, considering the above mentioned recombination abnormalities, one could question the validity of those QTLs. Because of the lack of recombination, the genetic resolution of the inversion is poor. Moreover, there are very little inheritance differences between the markers (more than half of the segregants have no recombination inside the inversion region). Thus, linkages to a marker or to another within the inversion might be equivalent. The extreme case of this phenomenon are the most distal markers of the inversion, which are identical in terms of inheritance patterns (see above). Each was linked to more than 160 sense traits, which overlapped logically almost entirely (96%); however, they were not counted as independent linkages, but as single QTL which exact location (at either locus) could not be resolved. Hence, one could consider the entire 2.2Mb inversion region as a single locus.
Some of those QTLs could also be due to gene disruption or perturbation at the border of the inversion. The expression of few genes located in the vicinity of the inversion break points could be severely modified directly (disruption of the sequence of genes, promoters, etc) or indirectly (modified chromatin state). This hypothesis is supported by the fact that the inversion extremities are eQTL and aseQTL hotspots. Another hypothesis is that numerous polymorphisms are co-inherited because of the lack of recombination within the inversion. The tremendous effects observed on the regulation of expression could therefore be due to several molecular variations. Each one could act on few or multiple traits. Many QTLs would then be identified because of the coinheritance of the variations making the magnitude of the effects apparently greater. However, the population structure dependent QTL mapping that we have carried out should take most of these effect into account.
IV. Analysis of the genomic arrangement of genes linked to the swc5 locus Antisense transcripts originate from read-through transcription, overlapping transcript, bidirectional transcription, or autonomous transcription (Bitton et al, 2011;Chen et al, 2012;Ni et al, 2010;Pelechano & Steinmetz, 2013). The arrangement of genes with antisense transcription can be used to narrow down the potential cause (Figure 8). For example, bidirectional transcription would result in antisense transcription for genes arranged in tandem. Antisense from overlapping genes would lead to an enrichment of convergent and divergent overlapping gene pairs. Transcriptional read-through would lead to the enrichment of convergent gene pairs. The magnitude of this effect may depend on whether the 5'-end of the read-through gene overlaps partly with the gene on the opposite strand. However, in S. pombe the distinction between overlapping and non-overlapping convergent genes is dubious. Transcription termination (and polyadenylation) sites are highly variable in fission yeast (Schlackow et al, 2013;Mata, 2013;Gullerova & Proudfoot, 2008;Gullerova et al, 2011). Our analysis of gene pairs with antisense transcription revealed an enrichment of overlapping and non-overlapping convergent genes and surprisingly a depletion of genes in tandem (overlapping and non-overlapping) (Figure 8). Although the enrichment of convergent gene pairs is consistent with transcriptional read-through, the depletion of genes in tandem required additional analysis.
We visually observed the organization of the fission yeast genome using a browser and noticed that coding protein genes seemed to be organized in stretches of tandem pairs that alternated with stretches of convergent/divergent pairs (Supplementary Figure 17A), which is an indication of a non-random organization of genes in the genome together (Dávila López et al, 2010;Hurst et al, 2004;Michalak, 2008;Li & Du, 2012) . We hypothesized that it was the cause of the observed depletion of tandem genes. Indeed, if a set of genes is enriched for convergent gene pairs, divergent and tandem genes pairs should be depleted.
Here, tandem pairs were fewer, but convergent pairs remained stable (Figure 8). In the case of convergent/divergent stretches, divergent pairs adjacent to selected convergent pairs would also be selected. To formalize this observation, we looked at repartition of "direction swapping" event in the genome, i.e. occurrences of three successive genes being in alternate directions (Supplementary Figure   17A). We compared the number of successive direction swaps in the fission yeast genome to an empirical null distribution obtained by permuting one million times the gene orientations. There was an excess of direction swaps (p <10 -6 , Supplementary Figure 17B), proving that there were more stretches of convergent/divergent pairs than expected by chance in the fission yeast genome.
Next we showed that this non-random gene organization was explaining the depletion of tandem gene pairs.
We randomly selected as many convergent pairs as found among the swc5 aseQTL target genes (853 convergent pairs, regardless of the overlaps). Then as many random aseQTL targets were picked among the chosen convergent gene pairs as there were among the real convergent pairs. We then categorized the other gene pairs thus selected (either divergent or convergent) and repeated this process 1,000,000 times.
We then compared the resulting distributions of the number of divergent and tandem pairs to the one obtained by performing the same analysis but in randomizing the genome by permuting the gene orientations. More tandem pairs and less divergent pairs were selected when randomizing the genome Supplementary Figure 17C). Altogether, these results show that the depletion of tandem gene pairs among the swc5 aseQTL targets was an artifact due to the non-random organization of the S. pombe genome.

V. qPCR assessment of antisense/ sense gene expression
Four target genes of the hotspot 11 were chosen to perform additional exploration of their expression regulation via semi-quantitative real time PCR. We chose target genes whose sense and antisense levels were linked to the hotspot 11, and that were arranged convergently, regardless of their potential overlaps.
The experiments were design to interrogate the sense and antisense levels of the studied genes (which sense and antisense expression levels were linked to of swc5), the sense levels of the other convergent genes, and to detect potential read-through level (Methods, Supplementary Figure 16A). As expect for cdb4, none of the antisense levels were measured in a part of the gene that overlaps (according to annotations) with the other convergent gene. All experiments were performed in biological triplicates. In addition, particular care was taken to estimate and limit the synthesis of primer-independent cDNA, which has been shown to bias the measurement of antisense levels (Haddad et al, 2007;Feng et al, 2012). Because we were expecting widespread changes in expression levels, we restrained our analysis to expression ratios to avoid issues due to global changes affecting the reference gene (Material and Methods).
Results (Figure 7) confirmed the QTLs: we observed an increase of antisense to sense ratio in a pool of segregants carrying wild-type swc5 compared to a pool of segregants carrying swc5-fs. These differences also existed among the parental strains for all but one gene (cdb4), a case of transgressive segregation, which implies that other genetic factor play a role, as detected in the aseQTL analysis (in addition to the swc5 linkage, the antisense level of cdb4 was linked to another locus).
Moreover, the results in the other studied deletion strains are in agreement with the hypothesis that the effect of the hotspot 11 goes through H2A.Z. We observed an increase of the antisense to sense ratio in all cases, except for two genes in the ∆swc5 strain. In these two cases, the inconsistency may be due to either other genetic variants in cis of swc5 or to the effects of auxotrophic markers in the ∆swc5 strain (auxotrophic markers can potentially widely affect gene expression (Brem et al, 2002)).
In the case of the gene pair cdb4/thi4, the increase of the antisense to sense ratio observed in cdb4 could be simply due to an increase of the expression of thi4 (the gene convergent to cdb4 on the other strand, gene B in Supplementary Figure 16A), since these genes are overlapping (the 5'UTR of the thi4 overlap the CDS of cdb4). To assess this possibility we looked that the ratio of the antisense levels of cdb4 (measured in the CDS of cdb4) on the sense level of thi4 (in the CDS of thi4). Results indeed show an increase of this ratio following the antisense to sense ratio thus demonstrating that other mechanisms than gene overlaps are involved (Supplementary Figure 16B).
Finally, to assess and quantify read-through transcripts, we performed qPCR in the gene body of the gene opposite to the swc5 QTL target (gene B, Supplementary Figure 16A) using the RT product of the antisense of the swc5-fs target genes (gene A, RT using AS-A primer, in Supplementary Figure 16A). These experiments were not conclusive even in conditions for which read-through transcription had been shown to take place (∆pht1 and ∆pht1∆clr4 strains). The levels obtained were too close to the background to allow for drawing any conclusions. This neither confirms, nor invalidates that read-through transcription takes place. Besides the special experimental procedure we applied to circumvent the known issue of qPCR in antisense expression measurement (Haddad et al, 2007;Feng et al, 2012), we reached the limit of the method.   -* Significantly enriched for antisense linkage (if the eQTL were randomly distributed across the genome, no bins should contain more than 21 eQTLs, or 24 aseQTLs; all the hotspot described here are enriched for eQTLs) Ω The shared function has been determined by GO enrichment analysis and is idicated when it is shared by more than 10% of the targets (Supplementary Table S4) Supplementary

Remove baseline
Curve Area = Sum of integrals  Figure S1. Growth characteriza on of the recombinant strain library.
A. Experimental procedure. A 48-well micro ter plate, which is compa ble with the BioLector microfermenta on system, is inoculated at a star ng op cal density (OD 295nm ) of 0.2 using cells grown in pre-cultures for 36 hours. The plate is then sealed with gas-permeable membranes and loaded in the BioLector system, and the desired experimental condi ons are chosen. A er the end of a BioLector experiment, growth data are collected from the system and an aliquot of each culture is analyzed for cell viability using the Cell Lab Quanta SC MPL flow cytometer. The growth profiles recorded by BioLector are then retrieved and the growth data is analyzed to extract the quan ta ve measures that precisely describe the growth of the cul vated strains.
B. Growth efficiency is the gain of biomass provided by the given the nutrients in the culture and is calculated by subtrac ng the op cal density of the culture at the sta onary phase of growth from the op cal density at the me of inocula on.
C. An indicator of growth is the area under the growth curve, which is calculated from the sum of integrals.
D. When log-transformed OD values are plo ed over me, the maximum growth rate (μmax) corresponds to the maximum slope within the exponen al growth limit and the lag me (lagT) is the me at which the tangent to the maximal growth rate intersects the horizontal axis. The growth amplitude (Yo, μmax) and the me (Xo, μmax) when cells reach the μmax are also useful indicators of cellular growth.   B-C. Example of a gene (G1 stands for SPCC5E4.07 on Chromosome III posi on 656625 to 657071) with a decreased quan fica on when mapped on SSG. The gene sequence is not polymorphic. However a homologous sequence on gene G2 (G2 stands for SPBC776.11 on chromosome II posi on 3196491 -3196937) is polymorphic between the progenitor strains. In the Y0036 background those two sequences are iden cal and RNA-seq reads cannot be unambiguously mapped (C).
D. Comparison of the quan fica on bias in genes carrying SNPs vs indels. The quan fica on bias is the log ra o expression values obtained using SSG mapping over RG mapping. Only the values which were higher in SSG mapping were considered. The bias is significantly greater when genes contain indels < 10 −15 , one-sided Wilcoxon rank-sum test). The bias is also significantly greater when all measurements are considered ( < 10 −15 , one-sided Wilcoxon rank-sum test).
E Correla on between the polymorphism density and the magnitude of the expression quan fica on bias. For each gene containing a polymorphism, the log-ra o of its average expression obtained via SSG mapping over RG mapping is plo ed against its SNP density. Two points are not plo ed for visual ease (SPCC1906.03 x=0, y=-0.435; SPNCRNA.1685 x=2.94, y=0.372). Both quan es are posi vely correlated (Spearman's = 0.90, < 10 −15 ). B. Comparison of the error of quan fica on when using strain-specific mapping (SSG) versus reference mapping (RG). Quan fica on errors were evaluated as the absolute log difference between the simulated expression levels and the measured expression levels. The number in the quadrants represent the number of points in each. In most of the case strain-specific genome mapping reduced the measurement error, but the magnitude of the improvement is small. The genes linked to these hotspots defined groups that were subsequently analysed (Supplemetary Table S3).
B. Comparison of the eQTL hotspots found when using alterna ve normaliza on strategies. In order to evaluate whether the detected hotspots were ar facts due to normaliza on biases, we tested other normaliza on strategies: the same as described (Materials and Methods) without batch correc on (green) and the quan le normaliza on (Bolstad et al, 2003) (blue). eQTLs were mapped and hotspots were detected as previously described (Materials and Methods). Fewer eQTLs were detected when using these alterna ve strategies. The numbered regions correspond to the hotspots detected using the original normaliza on strategy. All the previously detected hotspots were also significant using both alterna ve normaliza on strategies, except for hotspots 4 and 6 that were not significant when the batch correc on was not applied (green). A, D, G. The ra os of the pep de rela ve to spiked in aliquots of a chemically synthesized internal heavy reference pep des for all samples are shown. Pep de 1 (A) is located located before the frameshi . Pe des 2 and 3 (D, G) are located a er the frame shi . MS-signals with signal-to-noise ra os of less than 3 are not considered (*).  Figure S15. Compara ve average Pht1 occupancy in 968 (scw5 + ) of the genes for which an sense expression changes are linked (purple, n=1,384) and not linked (black, n=3,722) to the swc5 locus. There is significantly more Pht1 at the +1 nucleosome in genes whose an sense level is associated with the swc5 QTL than in the others ( < 10 −15 , one sided Wilcoxon rank-sum test). A. Schema c representa on of the design of the RT-qPCR experiments. Gene A represents the gene, which sense and an sense levels were linked to swc5 locus. Gene B is the gene on the opposite strand arranged in a convergent way. Two RT reac ons were performed for each strain interroga ng the sense levels (using, the RT primers S-A and S-B, in blue, for these gene pair and the sense-RT primers of the other convergent gene pairs as well as a primer specific of cdc2, for normaliza on), and the an sense level (using, the RT primers AS-A, in red, for these gene pair and the primer of the an sense-RT primers of the other convergent gene pairs as well as a primer specific of cdc2, for normaliza on). The qPCR to assess the sense levels of gene A and gene B were done using their respec ve qPCR primers (qpcr-A, and qpcrB primers) and the product of the sense RT. The qPCR assessing the level of the an sense of gene A was performed using the qpcr-A primers and the product of the an sense RT. Finally read-through was assessed using the qpcr-B primers with the product of the an sense RT.

Supplementary
B. The increase in an sense level of cdb4 is not due to an increase of thi4 sense level. The 5'UTR of thi4 overlap with the CDS of cdb4. Therefore the increase of cdb4 an sense or sense level could simply be due to an increase of thi4 sense level. Here we show the ra o of cbd4 an sense level on thi4 sense levels. This ra o follows the an sense-to-sense ra o of cdb4 ( Figure 7A). Error bars indicate the standard error to the mean among the three biological replicates.  Figure S17. The deple on of tandem genes pairs among the swc5 aseQTL target is due to a genome organiza on bias.
A. Schema c representa on of a stretch of convergent/divergent gene pairs. In this genome organiza on, when a gene of a convergent gene pair is an aseQTL target, the adjacent divergent pair is selected. Orange arrows indicate orienta on swaps. Successive orienta on swaps characterize convergent/divergent stretches.
B. Number of successive gene direc on swaps in the fission yeast genome (red dot), and in 1,000,000 randomized genomes (boxplot). The orienta on of the genes in fission yeast genome is not random.
C. Distribu on of the number of adjacent tandem and divergent gene pairs selected when picking randomly aseQTL convergent pairs in the S. pombe genome (fuchsia), and a shuffle genome (green).  Figure S18. 3'-RACE to detect read-through transcripts from the gene tpp1. Three different oligonucleo des were used to prime the 3'-RACE experiment: 3'-RACE Primer 1 (A, B), located into tpp1 ORF; 3'-RACE Primer 2 (A, C) and 3'-RACE primer 3 (A, D), located downstream of the normal 3' termina on signal. Distance between oligonucleo des is indicated in A. At least five prominent 3'-RACE signals (RS) have been located with the three oligonucleo des used (RS1 to RS5) as seen in B, C, and D. Most tpp1 transcripts terminate at annotated tpp1 3' end (RS1 and RS2), although read-through transcripts are detected (RS3, RS4 and RS5) are detected even in wildt type (and DBVPG2812). Y0036 strain as well as other mutants known to affect transcript read-through show semi quan ta ve stronger bands in the 3'-RACE signals obtained downstream of the normal 3' ends (B and C).  Figure S19. Evalua on of the influence of sequencing depth on the RNA-seq based genotyping.
The reads from one sample (sample 59) were successively subset, and the genotype of strain R1-4 was inferred based on the subsets. For the sake of clarity, only one sample is shown; we obtained similar results with all the samples tested.
A. Genotype calls at 4,578 sites polymorphic between 968 and Y0036 parental strains. The grey horizontal bars represent the effec ve sequencing depth of the subsets (the number of mapped bases scaled to the genome size); the values are also indicated (1x correspond to one me the genome size).
B. The propor on of sites that could be genotyped in the different subsets is plo ed against the effec ve sequencing depth. The black line represents the sites for which the genotype could be directly called. The green line corresponds to the final genotypes a er the missing values were imputed from the segrega on pa ern of the flanking polymorphisms segrega on (Materials and Methods).