Noncanonical functions of the serine‐arginine‐rich splicing factor (SR) family of proteins in development and disease

Members of the serine/arginine (SR)‐rich protein family of splicing factors play versatile roles in RNA processing steps and are often essential for normal development. Dynamic changes in RNA processing and turnover allow fast cellular adaptions to a changing microenvironment and thereby closely cooperate with transcription factor networks that establish cell identity within tissues. SR proteins play fundamental roles in the processing of pre‐mRNAs by regulating constitutive and alternative splicing. More recently, SR proteins have also been implicated in other aspects of RNA metabolism such as mRNA stability, transport and translation. The‐ emerging noncanonical functions highlight the multifaceted functions of these SR proteins and identify them as important coordinators of gene expression programmes. Accordingly, most SR proteins are essential for normal cell function and their misregulation contributes to human diseases such as cancer.


INTRODUCTION
The transcriptome is a defining feature of cellular identity, and gene expression is often surprisingly heterogenous on the single cell level during biological processes such as cell differentiation. Splicing is one mechanism linking transcriptome heterogeneity to proteomic diversity. [1,2] Approximately 95% of multiexon genes undergo alternative splicing in humans. [3] Further transcriptional regulation and cross-talk between the cytoplasm and the nucleus allows cells to respond rapidly and specifically to external stimuli. RNA processing factors are emerging as key players in the regulation of the transcriptome and proteome. in splicing functions. [4,20,21] Moreover, SR proteins have several roles in splicing. Firstly, they bind exonic splicing enhancers and promote the inclusion of weak splice sites by enabling spliceosome assembly ( Figure 2A). However, the precise location of the SR protein binding event on a nascent transcript can alter the functional outcome of the splicing events. [21,22] For example, SR proteins bound to exons usually recruit U1 and U2 small nuclear ribonucleoproteins (snRNP), which then allows the spliceosome to assemble and splicing to occur ( Figure 2B, left panel). In contrast, SR protein binding to intronic sequences inhibits recruitment of U1 and U2 ( Figure 2B, right panel). [21,22] In case two SR proteins bind adjacent exons, they compete for U2 snRNP recruitment, and the SR protein best able to recruit U2 determines where splicing occurs ( Figure 2C). [21,23,24] However, not all SR proteins promote splicing. SRSF10 and SRSF12 also act as global repressors of splicing, depending on their phosphorylation state. [17] Phosphorylation is essential for SR proteins to function in pre-mRNA splicing and carried out by SR protein-specific kinases (SRPKs). [25] Phosphorylation and dephosphorylation events occur in a cyclical manner and determine the intranuclear distribution of splicing factors and their location within cells, as well as impacting their functions. [25] In contrast to their redundant functions in splicing, SR proteins are often essential for other, currently, less well-described roles in transcriptional activation, nonsense-mediated decay (NMD), mRNA export and translation. [26,27] The breadth of these additional roles and the apparent lack of redundancy for these noncanonical functions suggests that SR proteins are key RNA processing factors with fundamental roles in coordinating many steps of gene expression. Accordingly, SR proteins are required for normal development. Yet, how precisely SR proteins function in healthy tissues and how their misregulation contributes to diseases remains largely unclear.
Here, we focus the noncanonical functions of SR proteins and discuss the importance of these functions in development and disease with the focus on cancer.

SR proteins link transcription with splicing
Splicing mainly takes place during transcription, bringing the catalytic spliceosome into close proximity with the RNA polymerase II (Pol II) complex. [28] While it remains unclear how precisely these two regulatory complexes interact, it is now widely assumed that transcription and splicing are mostly concurrent events. [28] The earliest regulatory role for SR proteins in regulating gene expression is the coordination of promoter-proximal pause release of Pol II. [27,29,30] Pol II promoter proximal pausing occurs shortly after transcription initiation. After Pol II escapes the promoter, it becomes phosphorylated at serine 5 of the C-terminal heptapeptide repeat domain (CTD). Pausing of Pol II then occurs 30-50 bases downstream of the transcription start site. To enter into productive elongation, serine 2 of the CTD must be phosphorylated along with the pausing factors DRB-sensitive inducing factor and the negative elongation factor. [31] These phosphorylation events are carried out by CDK9, the kinase domain of the positive elongation factor b (P-TEFb), which is held in an inhibitory complex by HEXIM and LARP7 with the long noncoding RNA (lncRNA) 7SK ( Figure 3).
Both SRSF1 and SRSF2 are part of the 7SK snRNP complex. [29] Knock-down of SRSF1 or SRSF2 leads to a block in nascent RNA production, and the pool of phosphorylated Pol II at serine 2 is reduced by 40%. [32] SRSF2 promotes pause release because it has a higher affinity for single-stranded RNA than for the 7SK complex, and thus, SRSF2 leaving the complex leads to successive dissociation of P-TEFb from the complex. [29] This in turn leads to phosphorylation of Serine 2 in the CTD of Pol II and induces the release of the pausing factors thereby allowing for productive elongation. [29,33] Whether SRSF1 acts through the same mechanism or simply cooperates with SRSF2 remains to be determined. Thus, a direct function in transcriptional activation may be unique to SRSF2. [29] Promoter proximal pausing enables rapid and dynamic changes in transcription, in particular of regulators involved in signal transduction. [34][35][36] Pol II pausing is a common feature of gene regulation during development and stem cell differentiation, yet it does not occur at all genes. [36][37][38][39] In addition to SRSF2, SRSF3 and SRSF4 have also been shown to interact with 7SK in cross linking and immunoprecipitation experiments. [40] Although the functional consequences of these interactions are unknown, it may involve the coordination of transcription termination. [41] The interaction with 7SK might bring SRSF3 into close proximity to Pol II to induce the release of Pol II from DNA. Inefficient processing of the nascent transcript such as errors in capping, splicing or 3′-end formation all lead to RNA nuclear retention, degradation or both. Aberrantly processed mRNAs are degraded through the nuclear exosome. [42] SRSF3 can interact with the exosome to destabilize targeted mRNAs. [41,43] Thus, SRSF3 termination with mRNA export. [44] SR proteins regulate mRNA export Export of mRNAs from the nucleus to the cytoplasm is coupled to upstream pre-mRNA splicing events and downstream NMD (Figure 4). [45] Therefore, it is unsurprising that SR proteins have also been linked to mRNA export. The majority of SR proteins shuttle between the cytoplasm and the nucleus. [46] SRSF2 and SRSF5 are the only reported exceptions and are predominantly localised to the nucleus. SRSF2 contains a hydrophobic nuclear retention signal in its RS domain. [47] The reasons why SRSF5 is retained in the nucleus is unclear. Interestingly, the cellular location of both SR proteins was dependent on cellular differentiation status in an embryonic carcinoma cell line (P19). Both SRSF2 and SRSF5 shuttled between the nucleus and cytoplasm in undifferentiated but not differentiated cells. [48] This raises the interesting possibility that SR proteins facilitate mRNA export in a cell state-specific manner. [44] Further studies focusing on the dynamics of nuclear versus cytoplasmic localisation revealed that most SR proteins shuttle continuously, but the majority of the SR protein pool is found in the nucleus under homeostatic conditions. [46] Reimport of proteins from the cytoplasm to the nucleus is often dependent on the level of active transcription. Inhibition of transcription using actinomycin D or 5,6-dichlorobenzimidazole 1-β-ribofuranoside (DRB) led to enhanced reimport of SRSF7, while SRSF1 and SRSF3 were unaffected. [46] SR proteins regulate nonsense-mediated decay  Figure 4). [49,50] This was true for all SR proteins tested, which included SRSF1, SRSF2, SRSF5 and SRSF6. Mechanistically, SR proteins enhance NMD by promoting the inclusion of a PTC, which leaves transcripts susceptible to degradation.
NMD is dependent on the formation of the exon junction complex (EJC). [51,52] The EJC is a large protein complex deposited on 80% of exon-exon junctions as a consequence of splicing. [26,51] The EJC determines structure, composition and fate of spliced mRNAs.
Thus, in addition to being required for NMD, the EJC also functions in mRNA export and promoting the pioneer round of translation, which is important for mRNA quality control. [26,51,53] Several members of the SR protein family interact with the EJC. [49,51,54] These interactions stabilise the binding of the EJC to mRNA, which in turn also enhances NMD pathway member binding leading to increased degradation of reporters containing PTCs. [50] Interestingly, all members of the SR protein family themselves contain a highly conserved PTC, also called a poison exon, allowing them to regulate their own degradation, and hence expression levels. [40,55,56] In addition to regulating itself, SRSF3 also regulates the stability of SRSF2, SRSF5 and SRSF7 transcripts. [40] How precisely SR proteins interact with the NMD complex remains unclear because this may depend on the specific SR protein. SRSF1's interactions with the NMD complex have been studied in more detail. SRSF1 promotes NMD by recruiting UPF1, a core member of the NMD pathway. [49] SRSF1 stabilises UPF1 binding to mRNA in an RNAdependent manner. Using reporter genes, the study demonstrated that SRSF1 promoted NMD of 50% of endogenous transcripts known to be NMD targets. [49] SRSF2 function in NMD has also been studied in more depth, but unlike SRSF1 it acts in an EJC-dependent manner. [54] Thus, F I G U R E 3 Schematic overview of SRSF2 function in transcription and splicing. SRSF2 is bound to the pause release complex and the spliceosome SR proteins act in a number of ways to enhance the activity of the NMD pathway.

SR proteins interact with noncoding RNAs
SR protein binding is not limited to pre-mRNA, they also associate with a variety of abundant lncRNAs including 7SK, MALAT1 and VTRNA1.1 (Figure 4). [24,29,[57][58][59] VTRNAs are integral components of large ribonucleoprotein vault particles found in the cytoplasm of most eukaryotic cells. [60] Binding of SRSF2 to VTRNA1.1. counteracts its processing into vault-derived regulatory small RNAs, which have been implicated in multidrug resistance and cell differentiation. [59,61] The nuclear lncRNA MALAT1 sequesters SR proteins in nuclear speckles, [58] where vital molecular processes, including chromatin organization, transcription and RNA processing are controlled. [62] Release of SR proteins from nuclear speckles requires phosphorylation of the RS domain as this modification weakens the protein-RNA interactions. [58] Thus, MALAT1 modulates the cellular levels of active SR proteins in the nucleus. In addition, SRSF1, SRSF3 and SRSF4 interact with microRNAs, snoRNAs and ncRNAs, yet the precise function of these protein-RNA interactions remain largely unexplored. [40,63] Possible functions of SR protein interactions with noncoding RNAs could be their regulation of transcription, stability or cellular localisation.

SR proteins function in mRNA translation
SR family members shuttling between the nucleus and cytoplasm, in particular SRSF1, SRSF3 and SRSF7, have been implicated in the reg-F I G U R E 4 SR proteins function in diverse cellular processes. Specific SR proteins play unique roles in regulating gene expression. All regulate splicing, but only some also regulate transcriptional processes. SR proteins can enhance nuclear exosome decay when over-expressed and some bind to noncoding (nc) RNAs in the nucleoplasm. Shuttling SR family members contribute to mRNA export and enhance the pioneer round of translation. Over-expression of SR proteins enhances nonsense-mediated decay (NMD) and several SR proteins autoregulate their own expression ulation of mRNA translation ( Figure 4). SRSF1, for instance, cosediments with both the 80S fragment of the ribosome and polysomes and enhances translation of intron-less as well as intron-containing reporter genes. [64] While the RS domain was required for SRSF1's role in translation, the RRM domain was not, suggesting that at least some functions of SR proteins in mRNA translation is independent of splicing. [64] In the cytoplasm, SRSF1 is recruited to stress granules in response to various stresses such as oxidative stress, osmotic shock and heat. [65,66] Stress granules store stalled translation preinitiation complexes. SRSF3 has been shown to repress translation of programmed cell death 4 (PDCD4) by binding to its 5′ UTR region and recruiting the mRNA to P-bodies, where translationally silenced mRNA are often deposited. [67] These data implicate cytoplasmic SR proteins in translation repression in response to cellular stress.
A subset of SR proteins autoregulate themselves at the translational level (Figure 4). [68] Over-expression of SRSF7, but also SRSF5 and SRSF6, leads to the inclusion of a PTC that splits the full transcript into two, one of which contains the RRM domain and the other the RS domain, this process is referred to as split-open reading frames. SRSF7 binding then protects these short transcripts from degradation via NMD and thereby enables their translation. While the truncated proteins containing the RRM domain can bind mRNA, they are unable to recruit the spliceosome. Moreover, the truncated protein is not sequestered in nuclear speckles and can outcompete full length SRSF7 leading to the production of unspliced transcripts. These unspliced transcripts then act as architectural RNAs, which sequester both intron containing and normal SRSF7 transcripts, ultimately restoring normal protein levels. [68] This autoregulatory feedback loop occurs at low levels during homeostasis, but is believed to be important during carcinogenic transformation and viral infections. [68]

Splice factor kinases regulate SR protein functions
The regulatory functions of SR proteins in transcription, splicing, NMD and mRNA export are often directly linked to the phosphorylation levels of the proteins. [69,70] In order to leave nuclear speckles and bind RNA, SR proteins are required to be phosphorylated. [58] However, during productive splicing and shuttling from the nucleus to the cytoplasm SR proteins are in a hypophosphorylated state. The interactions of kinases and phosphatases with SR proteins is intricate and plays a key regulatory role. For instance, over-expression of the SR protein kinase CDC2-like kinase 1 weakens splice-site selection because it lacks the mechanism to release phosphorylated SR proteins. [71] The release of phosphorylated SR proteins requires a second kinase, serine-arginine protein kinase 1 (SRPK1), which then promotes efficient splice-site recognition and subsequent spliceosome assembly [71] Human CLK2 and 3 also regulate the nuclear redistribution of SR proteins and thereby regulate alternative splicing. [72] Phosphorylation is a common mechanism to activate or deactivate proteins, in particular in signalling pathways. Hyperactivation of protein kinases is commonly linked to human diseases, most often cancer. [73] SR proteins are required for normal development Animal models are commonly used to define the functional roles of genes in development and disease. Knockout animal models revealed that most SR proteins are essential for normal development in flies, worms and mice. In Drosophila and mice all cases of knockout approaches caused embryonic lethality. In contrast, only depletion of SRSF1 was lethal in Caenorhabditis elegans. [6,17,[74][75][76][77][78][79] However, simultaneous deletion of multiple SR proteins resulted in morphological abnormalities in C. elegans. [6] Interestingly, depletion of the C. elegans ortholog SR specific kinase ceSRPK was also lethal, suggesting that proper SR protein functions are essential for development. [6] Conditional deletion models of SR proteins in heart, liver, pituitary gland and the haematopoietic and immune systems caused severe developmental defects. [80][81][82] Mammalian SRPK controls neurodevelopmental gene expression via the ubiquitin signalling pathway. [83] Whether aberrant splicing and/or other functions of SR proteins in RNA metabolism contribute to the developmental deficits has yet to be addressed.

Splicing-dependent and independent functions in stem cells
The importance of SR protein during development may be linked to their regulation of embryonic stem cell pluripotency; the ability to develop into the three primary germ cell layers of the early embryo.
While SRSF2 regulates alternative splicing in human pluripotent stem cells, [84] SRSF3 promotes pluripotency independent of splicing. [44] SRSF3 is required for normal development as Srsf3-null mice fail to form blastocysts, [75] and it was also identified as a potential regulator of reprogramming in mouse embryonic fibroblasts. [85] In mice, SRSF3 promotes pluripotency by regulating the nucleo-cytoplasmic export of Nanog mRNA and other (pre-)mRNAs encoding core pluripotency transcription factors. [44] In general, SR protein expression varies between cell types but it is often higher in undifferentiated cells and decreases as differentiation progresses. [56] In some cases, such as the cardiac and neural lineages, the differences in expression levels can be explained by enhanced inclusion of their own poison exon leading to NMD in the more differentiated cells. Thus, one challenge in deciphering the precise roles of individual SR proteins in stem cell differentiation is that they act in a crossregulatory network by controlling their expression through NMD via their own ultraconserved poison exon sequences. [56] Evolutionary, all prototypical SR proteins share a single ancient origin, [8] and interestingly, more distantly related SR proteins tend to positively regulate each other, while more closely related SR proteins compete by negatively regulating each other's expression. [56] An SR protein network may be particularly important for brain function as SRSF3, SRSF6, SRSF7 and SRSF9 have been implicated in neurological disorders by regulating splicing of Tau, a microtubule associated protein with multiple functions required for neuronal formation and health. [86][87][88][89] How do SR proteins contribute to disease?
Due to their complex function, highly conserved structure and essential roles in development, SR proteins have also been linked to human disease, but mostly cancer (Table 1). However, the underlying molecular mechanisms by which they contribute to diseases are difficult to disentangle due to their overlapping roles in several cellular processes and the context-dependent nature of their function.
Over-expression of SRSF1 or SRSF3 is sufficient to transform immortalized fibroblasts, [54,73,[120][121][122] yet the precise mechanisms by which they facilitate transformation remain unknown. SRSF1 is highly expressed in acute lymphoblastic leukemia (ALL) and its overexpression correlated with phosphorylation of tyrosine 19 leading to enhanced proliferation. [123] Tumourigenic functions of SRSF1 are likely due to both nuclear and cytoplasmic roles. SRSF1 supports proliferation by protecting against DNA damage. [124,125] In the cytoplasm, SRSF1 promotes cell division by activating translation through interactions with the mTOR signalling pathway. [126,127] Translational targets of SRSF1 are cell cycle regulators important for accurate chromosome TA B L E 1 SR proteins in human diseases with the focus on genetic variations that were also associated to cellular functions

Protein
Disease Type of cancer Misregulated function
Furthermore, the altered RNA binding of mutant SRSF2 leads to a cascade of alternative splicing events that disrupts the function of other RNA processing factors such as hnRNPs, which antagonise SR protein functions in splicing. [121] Since splicing and transcription are interlinked it is unsurprising that the mutation in SRSF2 also directly affected transcription. [33,133] Mutations in pre-mRNA splicing factors are commonly linked to MDSs and solid tumours. [134] In search of a common underlying mechanism as to how mutations in splicing factors contribute to cancer progres- loops are particularly stable and the single-stranded DNA is exposed to damage, for instance through spontaneous or directed deamination of cytosines to uracils. [135,136] Thus, the persistence of R-loops can impair genome integrity. Similar to reports using SRSF2 knockout cells, mutant SRSF2 also caused Pol II stalling resulting in R-loop formation and replication stress. [33] Since mutation or loss of SRSF2 is lethal, SRSF2 mutations commonly occur in combination with survival enhancing mutations in cancer. [32,133] Given that mutations in SRSF2 and other splicing factors alter RNA-binding affinities, it is highly likely that other functions in RNA metabolism, such as changes in mRNA processing, export, NMD or mRNA translation, also contribute to carcinogenesis. [32,133] However, SR proteins work in both a synergistic and antagonistic manner and how to disentangle those interactions in cancer is a challenging, yet pressing question for the future.

CONCLUSIONS AND OUTLOOK
The SR protein family consists of 12 family members. Initially they were described as splicing factors, however, it is now clear that they are important for most aspects of RNA metabolism. Although the noncanonical functions of SR proteins are diverse with a notable lack of redundancy, at present most studies focus on the splicing functions to decipher the role of distinct SR proteins. In the future, it will be impor-