Cellular control of protein levels: A systems biology perspective

How cells regulate protein levels is a central question of biology. Over the past decades, molecular biology research has provided profound insights into the mechanisms and the molecular machinery governing each step of the gene expression process, from transcription to protein degradation. Recent advances in transcriptomics and proteomics have complemented our understanding of these fundamental cellular processes with a quantitative, systems‐level perspective. Multi‐omic studies revealed significant quantitative, kinetic and functional differences between the genome, transcriptome and proteome. While protein levels often correlate with mRNA levels, quantitative investigations have demonstrated a substantial impact of translation and protein degradation on protein expression control. In addition, protein‐level regulation appears to play a crucial role in buffering protein abundances against undesirable mRNA expression variation. These findings have practical implications for many fields, including gene function prediction and precision medicine.

synthesis had become clear, including the genetic code and the role of messenger RNA (mRNA) [3].This was directly linked to the discovery of the lac operon by Jacob and Monod, which provided the first model of how protein synthesis could be regulated [4,5].The operon concept demonstrated that, in response to changes in the environment, protein synthesis can be altered by turning genes on and off, rather than transforming one enzyme into another as postulated by the competing "enzymatic adaptation" theory [6,7].
In the decades that followed, the field of molecular biology has generated detailed insights into the process of gene expression, from the transcription of genes into mRNAs, the translation of mRNAs into proteins, to the degradation of mRNAs and proteins.These advancements have provided a comprehensive understanding of the intricate mechanisms and molecular machinery involved in protein synthesis.However, the quantitative relationship between genome, transcriptome and proteome, along with their kinetics and dynamic changes in response to perturbations, are less well understood.

Copy numbers and dynamic range
Selbach and colleagues first quantified the entire protein expression cascade on a genome-wide scale [8].They found that, on average, proteins were five times more stable and 2800 times more abundant than mRNAs.For example, a diploid mammalian cell contains two copies of each gene, which produce a median of 17 mRNA copies and 50,000 protein copies.Note that these values are based on techniques that estimate, rather than measure, copy numbers.However, a consistent observation across multiple similar studies is that cellular copy numbers of proteins exceed those of mRNAs by three to four orders of magnitude [8][9][10][11].For example, Lahtvee et al. found that genes in Saccharomyces cerevisiae produce a median of 9 mRNA and 6384 protein molecules per cell [10].Therefore, in terms of molecules per cell, pro-teins are far more abundant than mRNAs, which are more abundant than genes.
Furthermore, there are large differences in the number of proteins (and mRNAs) produced by different genes (Figure 1B).For example, a HeLa cell contains an estimated 20 million copies of the cytoskeleton protein vimentin and only 6000 copies of the FOS transcription factor [12].Overall, the dynamic range of protein abundances in cells is far greater than that of mRNAs [8][9][10][11].Protein abundances can span up to eight orders of magnitude within a cell, compared to four orders of magnitude for mRNAs [11].

Gene expression changes
Neither mRNA nor protein concentrations are static entities.Cells dynamically regulate their abundances according to biological conditions, for example during development, the cell cycle and in response to perturbations."Multi-omics" studies that examine both transcriptome and proteome changes across multiple biological conditions are challenging and still quite rare.However, one area where robust findings have emerged in recent years are mRNA and protein abundance changes across human tissues, which have been assessed using a range of methodologies [11,[13][14][15][16].This research suggests that transcriptome and proteome differences between tissues are primarily quantitative rather than qualitative.For example, a comprehensive analysis of 32 human tissues revealed that 85% of the proteins quantified in any given tissue were detected across all tissues [16].However, of the genes identified at both the protein and RNA level, the study classified 31.8% of proteins (43.3% of transcripts) as tissue-enriched, and 12.8% of proteins (8% of transcripts) as tissue-specific.Although the numbers of tissue-enriched mRNAs are similar, they do not always match the enriched proteins.Indeed, about half of the gene products that are tissue-enriched or tissue-specific at the protein level are not enriched at the mRNA level [16].The brain appears to be a hotspot for such proteins.
Quantitative and kinetic aspects of protein expression control (A) Schematic drawing of the protein expression cascade.(B) The range of abundances of mRNAs and proteins in human tissues.Note that proteins are significantly more abundant than mRNAs and also span a wider dynamic range.Data were taken from ref. [11] and restricted to genes for which both mRNA and protein levels were quantified in at least half of all tissues.The abundance units are FPKMs and iBAQs for mRNAs and proteins, respectively.(C) The range of expression changes of mRNAs and proteins across human tissues, relative to their median abundances.For example, a 2-fold increase in the brain means that a protein is twice as abundant in the brain than in the median tissue.The plot shows the distribution of all relative fold-changes, for the same genes and tissues as in (C).Note that the vast majority of fold-changes are smaller than ±10-fold.(D) The range of half-lives of mRNAs and proteins in mouse fibroblasts.Data were taken from ref. [8].
As a general rule, relative fold-changes of mRNAs and proteins across biological conditions are significantly smaller than the massive dynamic range observed across genes (Figure 1C).For example, while the abundances of different proteins in a cell span many orders of magnitude, individual proteins are rarely changed more than 10-fold across tissues [11,17].Interestingly, proteomes correlate stronger across tissues than transcriptomes [11].
There is some variation between multi-omics tissue studies in terms of the reported differentially expressed gene products and their foldchanges, likely due to the use of different methodologies, samples and statistical approaches [16].However, the emerging consensus is that a large portion of the proteome is expressed ubiquitously, with expression levels modulated according to tissue or condition [11,[13][14][15][16]].

Impact of genetic perturbation on the proteome
The study of protein responses to gene deletions provides valuable insights into the relationship between genes and proteins.The advent of high-throughput proteomics technologies has made systematic screens of gene knock-out collections in human cell lines [18], yeast [19][20][21] and E. coli [22] feasible.These studies show that a multitude of proteins undergo abundance changes upon gene deletion.The number of affected proteins, as well as the magnitude of change, can vary significantly depending on the phenotype and function of the deleted gene.
Most proteins change primarily in one direction, that is, are either generally up-or generally downregulated, and only a few proteins exhibit changes in both directions across different knock-outs [21].The regulation of protein abundances by post-transcriptional processes has been highlighted through correlation with transcriptome data [20].In particular, protein turnover has been identified as a significant factor in the differential expression of proteins across knock-outs [21].
Many of the protein changes observed in knock-outs are not specific or functionally related to the deleted gene.For example, the interconnection between growth rate and proteome response has been observed, with a slow growth rate associated with a high number of (unspecific) differentially expressed proteins [20,21].Furthermore, it has been shown that gene deletions can result in chromosomal duplications, leading to the differential expression of a large number of proteins that are on the affected chromosome but not necessarily functionally related to the deleted gene [21].
Protein complex subunits co-vary across knock-outs [19,20,22] and often exhibit changes in abundance when another subunit is deleted.
In many cases, deletion of one subunit destabilises a complex and leads to the degradation of the remaining subunits.In other cases, upregulation of the remaining subunits occurs as a result of feedback mechanisms.For instance, deleting proteasome subunits can increase the abundance of the remaining complex.Proteasome abundance is regulated by the short-lived transcription factor Rpn4 through a negative feedback loop, which ensures the maintenance of proteasome levels during cellular stress.Consequently, when a subunit crucial for the function of the proteasome is deleted, Rpn4 accumulates and the remaining members of the complex become more abundant [21].

Turnover kinetics
In a living cell, proteins and mRNAs are continuously synthesised and degraded (turned over).In general, proteins are more stable than mRNAs, with one study reporting a median half-life of approximately 46 and 9 h in mammalian cells, respectively [8].Similarly to abundances, half-lives vary widely between different mRNAs [8,23] and proteins [8,24] (Figure 1D).Some proteins can be exceptionally longlived.For example, histones in neurons have half-lives of 50-100 days [25,26].Within cells, protein degradation is facilitated enzymatically by proteasomes and lysosomes [27], but proteins are inherently stable macromolecules.In archaeological samples, peptides as old as 1.9 million years have been discovered [28].
The biological determinants of mRNA and protein half-lives in cells are poorly understood, despite the growing clinical significance of this subject.Understanding RNA stability is important for RNA vaccine design [29], while protein stability is a factor in the development of PROTACs, an emerging class of drugs inducing protein degradation [30].In mammals, there is no significant correlation between mRNA and protein half-lives of individual genes [8,31].However, when analysing groups of genes whose mRNAs and proteins exhibit similar half-lives, certain broad trends emerge [8,31].For example, housekeeping genes that are involved in constitutive processes like translation and respiration produce both stable mRNAs and stable proteins.In contrast, genes with regulatory functions, such as transcription factors, RNA processing factors, signalling proteins and cell cycle regulators, tend to generate short-lived mRNAs and proteins [8,[31][32][33].This shows that mRNA and protein stabilities are linked, in a broad sense, to gene function.In addition, the half-lives of both transcripts and proteins correlate with their abundance [8,[32][33][34].In other words, abundant mRNAs tend to be more stable, and the same applies to abundant proteins.Note that abundance and function are related parameters, as housekeeping proteins are generally far more abundant than proteins with regulatory functions.Together, these observations suggest that gene expression kinetics evolved to balance energetic constraints, such as favouring stability in abundant metabolic enzymes, and the ability to rapidly adapt to stimuli, which is facilitated by the short-lived nature of regulatory factors [8,[35][36][37][38].
Finally, half-lives can differ between cell states and cell types [26,33,42,48].However, to which extent the adaptation of turnover rates contributes to the establishment and stabilisation of new cell states remains an intriguing open question.

AMPLIFICATION OF COMPLEXITY: PROTEOFORMS, NEW PROTEINS AND UNSTABLE PROTEINS
Protein synthesis involves not only a profound amplification of signal, but also a substantial increase in the molecular diversity of gene products [49,50].This is because gene products can be modified at each stage of the protein expression cascade.A single gene can give rise to multiple mRNA species, due to alternative transcription start sites and various RNA processing steps, such as alternative splicing and RNA editing.Moreover, many different versions of a protein can be produced from a single mRNA species, for example through the use of alternative start or stop codons, post-translational modifications and cleavages [49,50].As a result, there is an escalation of molecular and functional complexity from the genome to the proteome.The different molecular forms in which the protein product of a single gene can be found are referred to as "proteoforms" [51].While there are approximately 20,000 protein-coding genes in the human genome, the average human cell is predicted to contain over one million different proteoforms [52].The extent to which proteoforms contribute to the functional diversity of the proteome is unclear, because the vast majority of proteoforms have not been functionally characterised, especially those involving post-translational modifications.However, a recent phenotypic screen of hundreds of phosphorylation-site mutants in yeast has identified growth phenotypes for 42% of these mutants, suggesting that a large proportion of phosphorylation events are indeed functionally relevant [53].Proteoforms will differ greatly between cell types, states and biological conditions, and the functional complexity of the proteome is further enhanced by differences in protein conformations, interactions and (subcellular) localisation.Consequently, the genome, transcriptome and proteome differ radically in their molecular and functional complexity.

New proteins from "non-coding" regions
Although the majority of the human genome is transcribed, only ∼1.2% of this RNA is considered protein-coding [54,55].This is in stark contrast to bacterial genomes, of which around 90% is protein-coding [55].However, in recent years the advent of ribosome profiling (Riboseq) has dramatically altered our assumptions about the number of protein-coding human genes [56,57].Ribo-seq has revealed thousands of translated open reading frames (ORFs) within long noncoding RNAs and untranslated regions (UTRs) of known mRNAs [58].Most of these are small ORFs (smORFs) that had been overlooked so far due to their size (<100 amino acids).How many of these translation products produce stable, functional microproteins is still unclear, in part because microproteins are overlooked in conventional proteomics workflows [59].
Nevertheless, it is clear that at least some smORFs have important biological functions.In particular under conditions of stress, cells can translate so-called upstream open reading frames (uORFs), which are smORFs found in the 5′-UTRs of many stress-response genes and oncogenes.This may be especially important for cancer biology.
One hallmark of cancer is a general increase in mRNA translation, a downstream effect of various different driver mutations such as amplification of the Myc oncogene [60,61].This overwhelms cellular protein quality control mechanisms, resulting in chronic proteotoxic stress and activation of the integrated stress response (ISR) [62].The ISR counteracts stress by repressing translation globally through eIF2α phosphorylation, while promoting unconventional, eIF2A-driven translation of uORFs.Consequently, cells undergoing proteotoxic stress, including cancer cells, exhibit a global shift towards translation from 5′-UTRs of mRNAs [63,64].Translation of some of these uORFs is only required to protect the main ORF from the ISR-mediated translation shutdown, but others produce stable, functional microproteins.In the future, a systematic validation and investigation of the latter, for example, using proteomics, could reveal hundreds of novel proteins with important biological functions during the stress response or oncogenic transformation.

Non-functional, unstable proteins
Most research focuses on proteins that are stable and biologically active.Intriguingly, identifying unstable, non-functional translation products may also be important.Many newly synthesised proteins are rapidly degraded by the proteasome [65], including aberrant proteins removed by quality control mechanisms [66].Some of the resulting peptides are presented by MHC-I as extracellular signature, through which T-cells recognise and eliminate infected or cancer cells [67,68].In particular, unstable proteins originating from stress-induced uORFs may form a cancer-specific extracellular signature that could be exploited for cancer immunotherapy [63].Indeed, unconventional translation from non-coding regions is the main source of targetable tumour-specific antigens [69,70].Due to their fleeting nature, few unstable proteins from non-coding regions are known so far, and fewer still can be linked to oncogenic transformation.Mass-spectrometrybased immunopeptidomics has made tremendous progress in the detection of such tumour antigens [71] and further method development in this area holds significant potential for cancer immunotherapy.

THE RELATIONSHIP BETWEEN mRNA AND PROTEIN LEVELS
The extent to which protein levels are determined by mRNA levels is a matter of ongoing research and debate.To address this issue, various "multi-omics" studies have measured both mRNA and protein levels and analysed their relationship (for example [72][73][74][75][76][77]). In principle, a strong correlation between transcriptome and proteome would indicate that proteins merely mirror expression changes of the corresponding mRNAs.In contrast, weak mRNA-to-protein correlations would suggest that protein abundance is controlled independently of mRNAs, that is, through translation and protein degradation.After initial results appeared to be conflicting [14,78], a growing consensus on F I G U R E 2 Across-gene versus across-sample comparisons of mRNA and proteins.An illustration of the two ways in which mRNA-to-protein correlations can be calculated, using tissue transcriptomics and proteomics data from ref. [11].One can either calculate the correlation between mRNAs and proteins of different genes in a single sample (left), or the mRNA-to-protein correlation of a single gene across multiple samples (right).In general, across-gene correlations are significantly stronger than across-sample correlations, mainly because the former benefit from the large dynamic range of gene products produced by different genes.
the fundamental principles of protein regulation is now emerging.We will provide a brief summary here, as comprehensive reviews on this topic have already been published [50,79].
Correlations between mRNAs and proteins can be calculated either across genes or across samples (conditions), representing two distinct biological questions (Figure 2).Across-gene comparisons enquire how different genes within the same cell produce proteins with vastly different intracellular concentrations (see Section 2.1).For most samples and organisms, substantial across-gene correlations have been observed.Typically, about 40% of the variability in protein abundances can be explained by the variability in the corresponding mRNA levels [50].
Across-sample (also known as "within-gene") correlations enquire to which extent relative expression changes of a particular protein can be explained by changes in its mRNA, for example when studying the foldchange of this protein across tissues or in response to perturbation (see Section 2.2).Across-sample correlations are typically weak.For example, while most genes have a positive mRNA-to-protein correlation across human tissues, only half of these correlations are statistically significant [11,16].Depending on the study, median Spearman correlations across tissues are only 0.35 [11] and 0.46 [16], respectively.Across-sample mRNA-to-protein correlations vary widely between different genes (see section 5) and depending on the nature of the biological conditions or perturbations that are being compared.
For instance, transcriptional regulation plays a bigger role in large abundance changes during tissue differentiation [11,77] compared to subtle changes resulting from genetic variation between individuals [74,75].In general, the relative contribution of mRNA-level and protein-level regulation appears to depend on the magnitude of expression changes, with larger changes being more influenced by transcriptional regulation [31,80].This observation may be influenced by technical limitations, as larger expression changes could result in stronger mRNA-protein correlations simply because they can be measured more accurately.
Nevertheless, these insights collectively suggest a gene regulation model in which a gene's on/off state and the order-of-magnitude of its expression are controlled at the mRNA level, via transcription and mRNA degradation.When adapting to new biological conditions, substantial expression changes may necessitate transcriptional adaptations, such as the epigenetic activation or inactivation of developmentally regulated genes [81,82].However, most gene expression differences between biological states are quantitative, and are thus primarily controlled at the level of translation and protein degradation.
The fact that regulation via translation and protein degradation involves the "fine-tuning" of protein abundances, does not mean it has a minor role in controlling the proteome.This becomes evident when one takes into account not just the number of differentially expressed proteins and their fold-changes, but also the combined impact these changes have on the total protein pool of the cell.Such insights are made possible by studies that determine absolute protein levels, i.e. copy numbers per cell.For example, lipopolysaccharide stimulation of murine dendritic cells triggers the expression of immune response proteins via transcriptional activation [83].Consequently, most protein fold-changes are primarily determined by changes in their corresponding mRNAs.Nevertheless, more than half of the absolute change in protein molecules results from remodelling of the preexisting proteome via translation and degradation [83].The reason for this lies in the large dynamic range of proteins, where relatively small foldchanges in highly abundant housekeeping proteins can have a large combined impact on the total protein count within the cell.
Finally, an intriguing distinction between relative and total expression changes emerges from the analysis of yeast cells undergoing quiescence [9].During quiescence, the total mRNA concentration shrinks to a third of its proliferating cell level, even after adjusting for the concomitant reduction in cell size.However, this is a global reduction that affects most mRNAs equally, and few individual mRNAs display relative expression changes compared to proliferating cells.
On the other hand, the total intracellular protein concentration is only marginally reduced during quiescence.Despite this, the proteome undergoes substantial remodelling.Nearly half of all proteins show more than two-fold relative changes, including the downregulation of growth-related proteins and upregulation of stress response factors [9].

REASONS FOR THE DISCORDANCE BETWEEN mRNA-PROTEIN LEVELS
The disparity between mRNA and protein levels across biological conditions prompts a fundamental biological question: what are the reasons behind this divergence, and which mechanisms are responsible for it?There are multiple emerging explanations, which could be classified into three groups: analytical challenges, biological regulation, and the buffering of expression variation.

Analytical and technical difficulties
One analytical challenge stems from the fact that mRNA and protein levels are measured using fundamentally distinct technologies, for example, RNA-sequencing and mass spectrometry.These technologies have different biases and limitations, contributing to the observed disparities in mRNA-to-protein correlations.However, the fact that these disparities persist, despite considerable advancements in measurement technologies over the years, suggests that analytical inaccuracies are not their primary source.Weak mRNA-to-protein correlations can also reflect inadequate sample matching rather than substantial post-transcriptional control.When cells respond to perturbations, there is a time delay between mRNA and protein level changes.
For example, in dendritic cells stimulated with lipopolysaccharide, protein fold-changes after 12 h correlate best with mRNA changes at 5 h [83].Additionally, weak mRNA-to-protein correlations for secreted proteins arise because mRNA and protein are no longer present in the same cells [16].Related to this, blood contains proteins secreted from a wide range of cell types, making a direct comparison to mRNA levels practically impossible.

Impact of biological gene regulation
Despite such analytical challenges, the observed mRNA-to-protein correlations indicate that most proteins are subject to substantial regulation at the level of translation and degradation.However, the extent of post-transcriptional control differs strongly between genes.
For example, housekeeping genes, which are ubiquitously expressed but may differ quantitatively between tissues, are more likely to be regulated post-transcriptionally, leading to weaker mRNA-protein correlations.Conversely, tissue-or condition-specific genes induced in response to external stimuli exhibit stronger correlations [83].This is expected from a biological perspective, as post-transcriptional regulation becomes less important for genes that are activated solely when the corresponding protein is needed.For a small number of genes, condition-specific expression can also be achieved via translational activation of widely transcribed, inactive mRNAs [84,85].This mechanism may be employed for proteins required in rapid response to a stimulus, since upregulating translation is the fastest way to upregulate protein levels [86] and avoids constitutively expressing the protein when it is not needed [50,79].Examples include the yeast transcription factor GCN4, which is translated upon amino acid starvation [79,87] and ferritin in humans, of which the mRNA is translationally repressed until it is needed to sequester iron [88,89].Translational repression of mRNAs followed by translational activation is also important during early development to establish spatiotemporal expression patterns [90].It appears that the fast response time afforded by such "translation on demand" scenarios outweighs the energetically unfavourable solution of producing an inactive pool of mRNAs [85].
Another emerging theme is the regulation of protein stability by post-translational modifications (PTMs) [44][45][46][47].PTMs can either activate or inactivate degrons (protein regions that control degrada-tion), leading to the degradation or stabilisation of a target protein, respectively.Phospho-activated degrons are common in cell cycle regulators, controlling their periodic degradation, but phosphorylation can also stabilise proteins.A classic example is the phosphorylation of p53 upon DNA damage, which lowers its affinity for the E3 ubiquitin ligase MDM2, leading to its stabilisation and accumulation in cells [91].
Another well-documented case is the stabilisation of the transcription factor HIF-1α, which controls the cellular response to hypoxia.Under normal conditions, prolyl hydroxylation of HIF-1α promotes its degradation via the VHL E3 ubiquitin ligase, but a decrease in hydroxylation during hypoxia allows HIF-1α to escape degradation [46].The role of PTM-controlled protein stability as a general regulatory mechanism is still poorly understood.Similar to "translation on demand", it appears to be energetically unfavourable and predominantly affect regulatory proteins.However, the potential benefits for controlling key cellular proteins is evident: the mechanism links protein expression control to signalling pathways and could thus allow cells to rapidly sense and adjust to physiological changes.
Another phenomenon that contributes to the discordance of mRNA and protein levels is non-exponential degradation (NED), which affects more than 10% of proteins in mammalian cells [41].Many of these proteins are subunits of protein complexes that are produced in superstoichiometric amounts, with excess molecules subsequently being degraded by the proteasome.Stabilisation by incorporation into complexes may be yet another strategy that prioritises coordination and fine-tuning of protein levels over energy consumption.Intriguingly, this appears not to be the case for bacteria, which can translate subunits of protein complexes in precise proportion to their stoichiometry [92].

Protein-level buffering counters non-functional genomic influences
In some biological contexts the role of translation and degradation regulation is not to induce protein level changes, but to prevent them.In particular, accumulating evidence suggests that an important role of protein-level regulation is to buffer against potentially detrimental mRNA expression changes, which can arise as an unintended consequence of the intricate and diverse biology of the genome [73][74][75][76][77].
Buffering of gene proximity effects.Genes are not randomly distributed across the human genome [93].More than 10% of human genes are organised as so-called bidirectional gene pairs, which are divergently transcribed from a shared promoter region [94].Furthermore, genes often form clusters that extend along the sequence [95] or the 3D structure of the genome [75,96].Genes from such pairs or clusters tend to be coexpressed at the mRNA level [75,93,97].
To make sense of this intriguing observation, it is essential to consider it in the context of genome biology.The genome has a multitude of biological functions that extend beyond gene expression.For example, it not only encodes and replicates the genetic material but also safeguards it against DNA damage, retrotransposons and viral elements.
In eukaryotes, these processes take place within a dynamic chromatin environment, involving a vast array of proteins, including histones, histone-modifying enzymes, DNA repair and replication factors, and numerous others.The evolution of the genome is therefore influenced by a complex set of functional requirements that are not necessarily compatible with precise, quantitative control of gene expression.
The widespread existence of spatially proximal, but functionally unrelated human genes can be attributed to two phenomena.First, this can be a consequence of how new genes originate.Bidirectional gene pairs may arise if an initially non-coding antisense transcript evolves into an open reading frame with a new cellular function [102].Large clusters of active genes may reflect the susceptibility of open chromatin regions towards acquiring new genes via retroposition [93].In addition, the organisation of genes into pairs and clusters is thought to have evolved to minimise gene expression noise [103,104].Such expression noise could for example arise from fluctuating heterochromatin domains, which may stochastically spread into active chromatin regions, leading to random gene silencing in those regions [75,105].
Genes that are organised in pairs or clusters have significantly lower expression noise, possibly because they reinforce each other's activity state through more robust recruitment of transcriptional regulators [103,104].Pairs and clusters are enriched in dose-sensitive genes such as those encoding housekeeping functions and protein complex subunits [95,104].
The trade-off for these advantages of spatial proximity is that the affected genes cannot be controlled precisely and individually at the level of transcription.For example, transcriptional activation can lead to a ripple effect that activates nearby genes, even if they are functionally unrelated [106,107].In addition, if a region with multiple genes fluctuates between active and inactive chromatin states, those genes will effectively be coexpressed [105].As a result, coexpression of spatially proximal genes, and genes that are within or near epigenetically similar chromatin domains, is pervasive at the mRNA level.However, we and others have shown that such non-functional mRNA coexpression is buffered at the protein level [75,76].Ultimately, protein-level buffering through mechanisms outside the nucleus may allow cells to uncouple precise gene regulation from evolutionary constraints affecting other functions of the genome.
Buffering of genetic variation and mutations.Genetic variation, including point mutations, frequently affect the rate of transcription and stability of mRNAs, but their impact on protein concentrations of the same genes is often attenuated or absent [50,79].This effect has been observed, for example, for genetic variation between individuals [74], between mouse strains [108] and between species [73,109].One type of potentially detrimental mRNA expression changes are those resulting from losses or gains in gene copies.Gene amplifications or losses generally result in proportionally changed mRNA levels [110], but their proteins are often attenuated towards wildtype levels [72,[111][112][113][114][115].
Copy number alterations are extremely common in cancer cells [116,117].Their impact on gene dosage would likely be detrimental for cell fitness without extensive post-transcriptional buffering.Therefore, identifying the key buffering mechanisms may open up exciting new strategies for targeting a wide range of cancers.One known buffering mechanism is the aforementioned non-exponential degradation of surplus molecules of complex subunits, but this only affects about 10% of human proteins [41].Autoregulation has been proposed as an additional mechanism, for example when E3 ubiquitin ligases trigger the degradation of their own excess molecules [50].However, such mechanisms will also be restricted to a subset of cellular proteins, leaving a major proportion of protein-level buffering events unexplained.An interesting model system for future studies may be naturally occurring yeast strains, where aneuploidy is common and appears to have little effect on cell fitness in nature [118].It has recently been shown that, in contrast to lab strains, the impact of aneuploidy on protein levels is partially buffered in wild yeast isolates, through a mechanism that involves the proteasome [119].

PROTEOMIC TECHNIQUES TO STUDY TRANSLATION AND PROTEIN DEGRADATION
The relationship between transcriptome and proteome has been predominantly investigated using methods that determine quantitative differences between biological states, such as RNA-sequencing and quantitative proteomics.While this approach can provide valuable insights into the correlation between mRNA and protein levels, it often fails to yield critical information to determine how the proteome was regulated to enact the differences observed.For example, where protein abundance is controlled separately from the mRNA, abundance measurements alone are unable to discern whether the variations result from altered translation or altered protein degradation.One notable method that has been invaluable in studying protein regulation at the level of translation is ribosome profiling, which provides information on the translational activity of mRNAs [120].
Another promising approach to gain a deeper mechanistic understanding of protein-level regulation is metabolic pulse-labelling of proteins with heavy isotopes.This refers to a set of proteomic methods known as "pulse SILAC" or "dynamic SILAC" (stable isotope labelling by amino acids in cell culture; for comprehensive recent reviews on these techniques we refer the reader to refs.[24,121]).In brief, this approach entails switching a cell culture from standard growth medium to one with isotope-labelled amino acids, typically arginine and lysine.These labelled amino acids are then incorporated into newly synthesised proteins, enabling the differentiation between new (heavy) and preexisting (unlabelled) proteins by mass spectrometry (Figure 3).By monitoring the decrease in light signal (degradation of preexisting proteins) and the increase in heavy signal (synthesis of new proteins) over time, this method allows for simultaneous determination of turnover kinetics for thousands of proteins, including rates of protein synthesis and degradation, as well as half-lifes [24,122,123].When combined with a biological perturbation, pulse-SILAC becomes an instrument to distinguish expression changes driven by a change in protein synthesis from those induced via a change in protein degradation [124,125].For example, the JQ1-VHL PROTAC was shown to rapidly trigger the degradation of several bromodomain proteins (reduction in preexisting protein) and induce the downstream synthesis of ferritins (delayed upregulation of nascent protein) [125].
The ability to differentiate between mature and nascent proteins holds immense potential for advancing our understanding of protein expression regulation.However, so far pulse-SILAC has not been used widely.One reason for this could be its limited throughput, as it may take multiple days to complete the proteomic analysis of a single experimental condition.To overcome this limitation, several groups have recently reported multiplexing strategies combining SILAC with isobaric labelling [43,48,125,126].Multiplexing pulse-SILAC time points also has the advantage of providing quantitation within a single analysis, thus reducing run-to-run variation.Another limitation of pulse-SILAC is the restricted sensitivity for newly synthesised proteins, requiring multiple hours of pulse-labelling to accumulate sufficient signal in the heavy channel, which makes it difficult to study low abundance proteins and transient cellular responses.The methionine analog azidohomoalanine (AHA) has been proposed as a solution for this problem [127].AHA is incorporated into nascent proteins and can be enriched using click chemistry.Enrichment of newly synthesised proteins enhances their detectability, allowing for shorter pulse times.A downside to this approach is that the depletion of preexisting proteins results in a loss of degradation information.However, AHA labelling can also be performed in a pulse-chase set-up, where temporal sampling after AHA removal can monitor degradation of the AHAlabelled proteome [41].Promising recent advances for studying nascent proteomes are multiplexing strategies for AHA-labelled samples [128], and the development of new bioorthogonal enrichable amino acids, such as a threonine analogue [129].
An exciting future possibility that may increase both the throughput and sensitivity of pulse-SILAC experiments is offered by the advent of data-independent acquisition mass spectrometry (DIA-MS) [130].
Historically, the analysis of complex proteomes required sample prefractionation, especially for SILAC samples, which are more complex than their label-free counterparts.However, DIA-MS's improved sen-sitivity enables high proteome coverage without pre-fractionation.
DIA-MS has thus become a cornerstone of novel high-throughput proteomics approaches, which significantly boost the speed and robustness of proteome analysis [131].Until recently, SILAC samples were not suitable for DIA-based proteomics, but new software developments have paved the way for SILAC experiments to be analysed by DIA [132], with promising initial results [44,47,[133][134][135].In the future, establishment and benchmarking of robust data processing strategies for global proteome analysis by DIA-pulse-SILAC could enable high-throughput proteomic screens that determine the impact of perturbations on protein synthesis and degradation, respectively.

OUTLOOK
In this review, we aimed to provide an overview of cellular control of protein levels from a quantitative, systems biology perspective.There are substantial differences in the quantity, stability and dynamic range of the genome, transcriptome and proteome, respectively, and these are tightly linked to the biological functions of these macromolecules.
In general, due to the widespread impact of translational and posttranslational regulation, protein levels are more closely linked to gene function, whereas mRNA levels better reflect the structure and organisation of the genome.Our quantitative and kinetic understanding of protein expression control is still relatively limited, and the field has opened up many new questions about protein-level regulation.For example, cells regulate proteins that belong to the same biological process in a coordinated manner, even if they are not physically interacting [31].How do they achieve this if mRNA coexpression tends to be buffered at the protein level?What exactly are those buffering mechanisms?Furthermore, many crucial questions about how protein levels are controlled via degradation are still unsolved [24], such as the target specificities of most of the 600 human E3 ubiquitin ligases.
Emerging technologies, such as single-cell proteomics [136,137], are poised to further enhance our understanding of the relationship between mRNA and proteins.So far, comparisons between singlecell proteomes and single-cell transcriptomes corroborate findings from bulk analyses.mRNA-to-protein correlations tend to be relatively weak also at the single-cell level.Moreover, proteins show less expression variation than mRNAs and, consequently, proteomes are more stable across single cells than transcriptomes [138][139][140].This may reflect translational or post-translational buffering of protein levels against mRNA fluctuations, such as those arising from transcriptional bursting at the single-cell level.However, active biological regulation also appears to contribute to the observed deviations of protein from mRNA abundances at the single cell level.For instance, a recent single-cell analysis of the cell cycle revealed hundreds of proteins whose abundances are partially cell-cycle-dependent [141].Most of these proteins cycle independently of their mRNA counterparts, possibly because their stability is controlled post-translationally via phosphorylation.
Advancing our understanding of protein-level regulation will be important for fighting disease.For example, anomalies in protein-level regulation drive or sustain the growth of many types of cancer.Dysregulation of protein levels in cancer is not restricted to individual oncogenes, but extends to a systemic imbalance of translation and degradation [142].Understanding the (im)balance between translation, degradation and proteotoxic stress could help to identify and better exploit cancer cell vulnerabilities.For example, translation and degradation inhibitors show great promise for the treatment of heterogeneous cancers, because multiple pathways that are activated in cancer ultimately converge on these processes [143,144].However, sensitivity of patients to proteasome inhibitors is highly heterogeneous, so identifying markers of proteome states that are sensitive to such drugs could help to improve outcomes [62].
For such applications in biomarker discovery and precision medicine, how well the genome and transcriptome reflect the proteome is an important consideration.For example, many genes associated with Leigh syndrome show tissue enrichment at the protein level, but not the mRNA level [16].However, in this context it is also important to note that RNA-sequencing is presently a more mature, robust and readily accessible method than proteomics, and also typically covers more genes per analysis.Moreover, the fact that mRNA levels are not generally buffered against genomic abnormalities can also be an advantage, for example for the identification of chromosome abnormalities that are ubiquitous in cancers [145].
A critical aspect for the success of future clinical applications lies in advancing our insights from cell culture models to the organism level.
Recent large-scale initiatives are assembling comprehensive atlases of the complex array of cell types and cell states across the human body, along with the associated transcriptomic and proteomic diversity [15,146].These studies have uncovered a wide range of cell types exhibiting diverse gene expression programs within various tissues, including the heart and brain [147][148][149][150]. Spatially resolved transcriptomics [151] and the emerging technique of spatial proteomics [152] are set to significantly influence such studies.For example, recent applications of these techniques have revealed substantial variation in the transcriptome and proteome of hepatocytes depending on their location within the liver [153,154].
The discordance between protein and mRNA levels has also practical implications for non-clinical transcriptomics and proteomics applications.For example, a major challenge in biology is the widespread occurrence of understudied proteins [155][156][157].One successful method to predict potential biological functions of such uncharacterized genes is coexpression analysis.As may be expected from the discussion above, detecting functional associations by protein covariation analysis significantly outperforms mRNA coexpression analysis [76,158,159].Recent developments in high-throughput proteomics offer exciting future opportunities for gene function prediction across many species.First, functionally related proteins have been observed to co-vary across gene deletion strains [20][21][22].In addition, such studies enable new strategies for function prediction, which are based on observing the similarity of knock-outs [18][19][20][21], as well as the abundance change of a protein across a knock-out collection (reverse protein profile) [21].These guilt-by-association approaches, which require systematic screens, have been shown to outperform the traditional approach of studying gene function by directly examining the proteins that change in a specific knock-out [21].This highlights the need for more systematic and comprehensive approaches for understanding gene and protein functions as well as their relationships.
Finally, no single technology will be sufficient on its own to fully understand how cells regulate protein levels.This multi-disciplinary problem will need to be tackled with a multi-disciplinary approach.
Consequently, the further integration of genomic, transcriptomic and proteomic technologies will be crucial to elucidate this fundamental biological process.

3
Pulse-SILAC enables the calculation of protein synthesis and degradation rates.(A) Schematic drawing of a typical pulse-SILAC set-up.Cells grown in standard medium are switched to a medium containing heavy isotope labelled lysine and arginine, allowing the distinction of newly synthesised (heavy) and preexisting (light) proteins.(B) From the change of light and heavy signal over time, protein synthesis and degradation rates can be determined.