Transcriptome-wide analysis of trypanosome mRNA decay reveals complex degradation kinetics and suggests a role for co-transcriptional degradation in determining mRNA levels

African trypanosomes are an excellent system for quantitative modelling of post-transcriptional mRNA control. Transcription is constitutive and polycistronic; individual mRNAs are excised by trans splicing and polyadenylation. We here measure mRNA decay kinetics in two life cycle stages, bloodstream and procyclic forms, by transcription inhibition and RNASeq. Messenger RNAs with short half-lives tend to show initial fast degradation, followed by a slower phase; they are often stabilized by depletion of the 5′–3′ exoribonuclease XRNA. Many longer-lived mRNAs show initial slow degradation followed by rapid destruction: we suggest that the slow phase reflects gradual deadenylation. Developmentally regulated mRNAs often show regulated decay, and switch their decay pattern. Rates of mRNA decay are good predictors of steady state levels for short mRNAs, but mRNAs longer than 3 kb show unexpectedly low abundances. Modelling shows that variations in splicing and polyadenylation rates can contribute to steady-state mRNA levels, but this is completely dependent on competition between processing and co-transcriptional mRNA precursor destruction.


Supplementary Table S4
Raw data for mRNA degradation, half-lives and fit parameters for all genes Sheet 1: Raw and normalised RPM (reads per million) data for procyclic forms. The first set of columns show the raw data. The second set in yellow are normalised RPMs. To obtain these the RPMs were first calculated based on the non-redundant set of genes, then normalized to the SL signal as explained in the text (columns labelled "SL norm"). After normalization to time zero (not shown) the reads were used as input to the C++ code. To plot the decay curve for a gene of interest, use the data from columns R -AG. The time point after Actinomycin D addition is indicated by the number after "t", for example t30 means 30 min.
Sheet 2: Raw and normalised RPM (reads per million) data for bloodstream forms. Details as for sheet 1.
Sheet 3: Half-life and fit parameters for all genes, procyclic forms. Eq9 = fast-slow decay; Eq10 = slow-fast decay; lambda = rate of transition between states; mu = rate of degradation; nu = rate of initial degradation; U = transcript lifetime; Error = least squares. See (Deneke et al., 2013) for details. Values for isoform-specific regions of the PGK (Tb927.1.700-720) and THT (Tb927.10.8440,8530) genes are also included but the errors are very large, probably because the regions are quite short or include rather repetitive parts of the 3'-UTR.
Sheet 4: Half-life and fit parameters for all genes, bloodstream forms. Details as in Sheet 3.

Supplementary Table S5
Functional group half-lives and developmental regulation A Student t-test was performed for each functional class between the two life stages. Significant differences (p-value < 0.05) are in blue, and the extent of difference (bloodstream / procyclic ratio of mean half-life (HL) values) is coded in shades of green. Developmental regulation results from different analyses The regulation is plotted on a log 2 scale (0= no regulation, 2 = 4x regulation, -2 -1/4, etc.) Correlation coefficients and equations are also for log 2 -transformed data and were plotted using Microsoft Excel. Each spot represents an individual unique open reading frame. All outliers were 3 3 included. Datasets were: Fadda: this paper, rRNA-depleted RNA, sheared and 300nt-size-selected before cDNA synthesis; Singh (Singh et al., 2014) poly(A)+ RNA, sheared and 300nt-size-selected before cDNA synthesis; Siegel (Siegel et al., 2010) poly(A)+ RNA, not sheared before cDNA synthesis; Vasquez (Vasquez et al., 2014) poly(A)+ RNA sheared and 20nt-size-selected before cDNA synthesis. The magenta dashed line is perfect correlation.

Supplementary Figure legends
In theory, differences due to methods of RNA and library preparation, and sequencing, should cancel out when ratios are calculated. The fact that the correlation in (A) is much better than crosslaboratory correlations suggests that culture conditions and the precise parasites used are important.  The numbers of mRNA per cell are plotted on a log 2 scale but the scale has been converted to absolute numbers for ease of viewing. Correlation coefficients and equations are for the log 2transformed data; coefficients for non-log-transformed data are in parentheses. Results are from: Fadda -this paper and Manful (Manful et al., 2011) either poly(A)+ or rRNA-depleted RNA, sheared and 300nt-size-selected before cDNA synthesis; Singh (Singh et al., 2014) poly(A)+ RNA, sheared and 300nt-size-selected before cDNA synthesis; and Vasquez (Vasquez et al., 2014) poly(A)+ RNA sheared and 20nt-size-selected before cDNA synthesis. The within-lab correlations are best, especially those performed most recently (Singh vs Fadda). The Manful dataset was generated using older library construction and sequencing technology. The magenta dashed line is perfect correlation.   Overall mRNA decay was determined by fitting an exponential curve into data points of 3 biological replicates (and 2 technical replicates for BF) of the Northern SL hybridization blots. The values used to normalize the RNAseq data corresponding to the curve are: bloodstream forms t0 0.92, t5 0.82, t10 0.73, t20 0.59, t30 0.46, t60 0.23, t120 0.06. procyclic forms t0 0.92, t15 0.79, t30 0.68, t60 0.50, t120 0.27, t180 0.15, t240 0.08. Note that the data for E and F are also shown In Figure 2; we show them here as well because it is easier to compare the two curves. without the Northern and qPCR results,

Supplementary Figure S6
Comparison of the current half-life results with previous ones.
A. The half-lives obtained for bloodstream-form mRNAs in this paper are compared with those calculated in Manful et al., (Manful et al., 2011), from incubation with Actinomycin D for 30-min. Stable mRNAs (half-life >120 min) were arbitrarily assigned a half-life of 240 min. Results are shown on a linear scale. The P value from a Students t-test (comparing the two datasets and counting only half-lives between 2 min and 120 min) was 1.4 E-37.
B. Same results as in (A), but all data are log 2 transformed and the formulae and correlation coefficients are for the log-transformed data. The mRNAs are divided into three equally-sized groups according to half-life. The correlation is best for more long-lived mRNAs. This may be partly because the half-lives are nearer to the 30-min time point used in the Manful study. The very shortlived mRNAs yield very few read counts at the 30-min time-point, which makes half-life estimates based on that point alone inaccurate: note that no half-lives lower than 7 min could be measured.

Supplementary Figure S7
Comparison of RNAseq degradation curves with published Northern blot or qRT-PCR measurements.
RNASeq results are in black, Northern results in green and real-time qRT-PCR results are in cyan. The Northern and real-time qRT-PCR PCR data are those that were used to calculate half-lives listed in Table 2. The different symbols represent different biological replicates for qRT-PCR. For 5 5 Northerns, each symbol represents the average of at least 3 independent biological replicates, and the different symbols are from different replicate sets. For consistency, clear outliers in both datasets have been included. All results are for bloodstream forms.

Supplementary Figure S8
Regulation of abundance correlates poorly with regulation of half-life.
The steady-state mRNA abundance in bloodstream forms, divided by that in procyclic forms, is on the x-axis, while the corresponding half-life ratio is on the y-axis.

Supplementary Figure S9
The half-life -abundance ratio does not depend on chromosomal position The Figure is a schematic depiction of chromosome 10. Genes were ordered according to their position on the chromosome. Transcription initiation (green and red bars) and termination points (blue bars) were assigned manually according to ORF direction and chromatin marks (Siegel et al., 2009). The arrows show transcription direction and were also placed manually, and since some transcription units are short there may be some errors in the assignments. Purple bars: the half-life (in min) was divided by the number of mRNAs/cell/gene, and the result was log 2 transformed. High bars therefore mean that the mRNA has unexpectedly low abundance for its half-life. A space means only that there is no data for this particular position, or that it was not in the unique gene set.

Supplementary Figure S10
The relationship between half-life and abundance A. The mRNA lengths were extracted from TritrypDB and mRNAs lacking annotated splice or poly(A) sites were excluded. The coding sequences were then ranked according to the length and divided into 5 bins of equal size. Correlation coefficients were calculated, using log-transformed data, between the half-life and the number of mRNAs per cell per gene.
B. All mRNAs with half-lives between 8 and 16 minutes were extracted, ranked according to the length then divided into 5 bins of equal size (237 for procyclic forms, 435 for bloodstream forms). For each bin, the arithmetic mean of the abundance and the standard deviation are plotted. Note that given the sample size, standard errors would be too small to be visible on the plot. The mRNA abundances for each of bins 2-5 were significantly lower than those of bin 1 (Student T-test, P value < 0.02). Analyses using log-transformed data confirmed the relationship between length and abundance (not shown).
C. The number of mRNAs per cell per gene was plotted against the half-life for bloodstream forms, considering mRNAs that were less than 1kb long and had measured 5' precursor half-lives of less than 3 min. The grey squares indicate the results of modelling steady-state mRNA abundance with the published model (Haanstra et al., 2008) assuming a 5'-trans splicing half-time of 1 min. Note that the relatively high correlation coefficient for the data is partly caused by the fact that the sample size is smaller than in other plots.
D. The number of mRNAs per cell per gene was plotted against the half-life for procyclic forms, considering only mRNAs that were less than 1kb long and had measured 5' precursor half-lives of less than 4 min. A. The constant for polyadenylation (k 3 ; 0.41 min -1 ) was multiplied by (600nt)/(mRNA length in nt) and degradation of the 5'-trans spliced precursor (k 5 ) was kept constant at 0.08 min -1 .
C. Similar to (B), but reaction 4 (degradation of the un-spliced precursor) was also removed Note that the data for E and F are also shown In Figure 2; we show them here as well because it is easier to compare the two curves. without the Northern and qPCR results. The steady-state mRNA abundance in bloodstream forms, divided by that in procyclic forms, is on the x-axis, while the corresponding half-life ratio is on the y-axis.  Convergence / stop Direction of transcription (inferred from boundaries) Stop-start Start log2 (thalf of mRNA (min)/mRNAs per gene) Chromosome 10 Supplementary Figure S9: The half-life -abundance ratio does not depend on chromosomal position The Figure is a schematic depiction of chromosome 10. Genes were ordered according to their position on the chromosome. Transcription initiation (green and red bars) and termination points (blue bars) were assigned manually according to ORF direction and chromatin marks (Siegel et al., 2009). The arrows show transcription direction and were also placed manually, and since some transcription units are short there may be some errors in the assignments. Purple bars: the half-life (in min) was divided by the number of mRNAs/cell/gene, and the result was log2 transformed. High bars therefore mean that the mRNA has unexpectedly low abundance for its half-life. A space means only that there is no data for this particular position, or that it was not in the unique gene set.