Control of translation during the unfolded protein response in maize seedlings: Life without PERKs

Abstract The accumulation of misfolded proteins in the endoplasmic reticulum (ER) defines a condition called ER stress that induces the unfolded protein response (UPR). The UPR in mammalian cells attenuates protein synthesis initiation, which prevents the piling up of misfolded proteins in the ER. Mammalian cells rely on Protein Kinase RNA‐Like Endoplasmic Reticulum Kinase (PERK) phosphorylation of eIF2α to arrest protein synthesis, however, plants do not have a PERK homolog, so the question is whether plants control translation in response to ER stress. We compared changes in RNA levels in the transcriptome to the RNA levels protected by ribosomes and found a decline in translation efficiency, including many UPR genes, in response to ER stress. The decline in translation efficiency is due to the fact that many mRNAs are not loaded onto polyribosomes (polysomes) in proportion to their increase in total RNA, instead some of the transcripts accumulate in stress granules (SGs). The RNAs that populate SGs are not derived from the disassembly of polysomes because protein synthesis remains steady during stress. Thus, the surge in transcription of UPR genes in response to ER stress is accompanied by the formation of SGs, and the sequestration of mRNAs in SGs may serve to temporarily relieve the translation load during ER stress.


Bayesian Modeling Pipeline
In the development of our modeling framework, we use zero-inflated Poisson distribution to model RPF count data Y g1jk , for dealing with the excess of zeros . Let Z g1jk be independently distributed with a binary probability mass function having parameter p g1jk . Z g1jk is used to specify which mixture component the observed count data Y g1jk comes from. Take the Y g1jk 's to be independent with conditional distributions given Z g1jk = 1 that are Poisson distribution with parameter λ g1jk . The distribution of Y g1jk can be expressed as The parameter p g1jk is the probability of Y g1jk coming from Poisson part, which may vary depending on the sequencing depth and gene g. Thus we consider a logistic relation modeled as follows, where S 1jk is the normalization factor that adjusts for sequencing depths variation and potentially other technical effects across the replicates. 1

Supplemental Methods file 1
For RNA-seq read count data Y g2jk , we assume it follows a regular Poisson distribution with true expression mean parameter λ g2jk , i.e., For both mRNA and RPF samples, λ gijk represents the expression mean of k th replicate of j th time point in i th protocol of gene g. We can express expectations on read counts λ gijk as a function of normalization factor S ijk , baseline quantity related to RPF abundance λ g in the first time point, a quantity u gi that relates the protocol difference between mRNA and RPF, a quantity t gj that relates the fold change between two time points, a quantity w gij that captures the effect of the time on translation, and a quantity b gjk that relates to the pairing signal between RNA and RPF. In particular, the expected expression mean λ gijk is modeled as below, where S ijk is the normalization factor of k th replicate of j th time point in i th protocol; λ g is the normalized expression mean baseline for i = j = 1 (RPF sample in the first condition); u gi is the fold change between two protocols with u g1 = 1; t gj is fold change between two time points with t g1 = 1; w gij is the fold change between the translation efficiency of two time points, with w g11 = w g12 = w g21 = 1; b gjk denotes the one-to-one correspondence between RPF and mRNA samples, which is considered to be random. Note that the expression (3) is equivalent to a log-linear model if we express both sides in log scale, and the term w gij corresponds to the interaction between protocol and time, which is of main interest.
Differential translation can be characterized by the dissimilarity between the changes in mRNA and RPF expressions across the two conditions. This is equivalent to determining whether the parameter w g22 in our model (3) differs significantly from 1. And the corresponding hypothesis test is for each gene g.
We assume a hierarchical model for the gene-specific parameters (β g , λ g , u g2 , t g2 , w g22 , b gjk ) mentioned in equations (2) and (3). Since the simple null hypothesis we are most interested in is w g22 's H g w0 : w g22 = 1, and often times biologists are also interested in the differential expression analysis for RPF data between the two time points, which corresponds to testing whether the parameter t g2 differs significantly from 1, so a mixture of a point-mass at 1 and a Gamma distribution (due to conjugacy) is used as the hierarchical model for w g22 and t g2 . Again, we use Gamma distribution as the prior distribution for λ g , u g2 and b gjk because of their conjugacy, For computational convnience, we assume all the gene-specific parameters in (5) to be independent.
To formulate our problem of interest more clearly, we introduce latent variables Z gt and Z gw to denote which mixture component the time effect t g2 or translational efficiency change w g22 come from, where π t1 and π w1 denote the chance that time effect and translation efficiency effects equal to 1. Thus, for each gene g, our main focus becomes H g w0 : Z gw = 0 vs. H g w1 : Z gw = 1.

Markov Chain Monte Carlo Simulation
There are (11 + 2G) unknown parameters of interest in our model, The data likelihood function with observed data Y = {Y 1111 , Y 1112 , . . . , Y G222 } is Here we propose a fully Bayesian analysis for the parameter estimation. For all unknown parameters in (5), due to either conjugacy or to reduce the complexity of computation of the posterior distribution, we assume independent non-informative priors as follows, After specifying the prior distributions, data will be used to update these prior distributions to obtain the posterior distributions for inference. Posterior inference in our proposed model is implemented by using Markov Chain Monte Carlo (MCMC) simulation (Tierney, 1994). Gibbs sampling (Geman and Geman, 1984) is the most commonly used tool to perform MCMC simulations for Bayesian hierarchical models. We applied MCMC Gibbs sampling method with Rjags to implement posterior inference.
From Bayesian inference, our null hypothesis or any quantities of interest could be easily obtained from the posterior distribution. For example, our null hypothesis of differential translation could be assessed through the posterior probability p g = P r(Z gw = 0|Y ) for each gene g. In addition to the simple hypothesis testing problem, we could also applied our method to do other kinds of hypothesis testing, such as to test whether the fold change is within a certain interval or not.

Bayesian FDR Control
In ribo-seq studies, a large number of hypotheses are simultaneously tested, each relating to a gene. Hence, multiple testing procedures that control the number of false significant results is essential. False discovery rate (FDR) (Benjamini and Hochberg, 1995), defined as the expected proportion of false positives among the rejected hypotheses, has been the choice of error criterion in genomic studies.
The Bayesian version of FDR is an alternative way to estimate the FDR within the Bayesian framework. It has been proposed and discussed by several authors including Genovese and Wasserman (2003) and Newton et al. (2004). The Bayesian FDR can be obtained by using posterior probabilities of the null hypothesis. More specifically, for the analysis of translational efficiency change, for each gene g, g = 1, . . . , G, we denote the posterior probability that gth null hypothesis is true by v g = P (w g22 ∈ ∆ 0 |Y g ). If we are interested in detecting differential translated genes, with ∆ 0 = {1}, P (w g22 ∈ ∆ 0 |Y g ) is the posterior probability that gth gene is not differentially translated. P (w g22 ∈ ∆ 0 |Y g ) can be estimated by the proportion of the posterior samples obtained from MCMC for gene g that falls into the null set ∆ 0 , i.e., v g =P (w g22 ∈ ∆ 0 |Y g ) = 1 N N m=1 I(w m g22 ∈ ∆ 0 |Y g ), where N is the number of posterior samples. Then the posterior probability which the g th gene is differentially translated is estimated through 1 −v g . We reject H g w0 if the estimated posterior probabilityv g is smaller than a critical value c * . The critical value c * is chosen based on controlling the FDR below a target level γ, for example, 0.05, i.e., c * = sup{c : F DR(c) < γ}, where F DR(c) = G g=1v g I(v g < c) G g=1 I(v g < c) .
So the Bayesian FDR controlled at level γ can be calculated by .