The RNA recognition motif, a plastic RNA-binding platform to regulate post-transcriptional gene expression

Authors


F. H.-T. Allain, Institute for Molecular Biology and Biophysics, Swiss Federal Institute of Technology Zurich, ETH-Hönggerberg, CH-8093 Zürich, Switzerland
Fax: +41 1 6331294
Tel: +41 1 6333940
E-mail: allain@mol.biol.ethz.ch
Website: http://www.mol.biol.ethz.ch/groups/allain_group

Abstract

The RNA recognition motif (RRM), also known as RNA-binding domain (RBD) or ribonucleoprotein domain (RNP) is one of the most abundant protein domains in eukaryotes. Based on the comparison of more than 40 structures including 15 complexes (RRM–RNA or RRM–protein), we reviewed the structure–function relationships of this domain. We identified and classified the different structural elements of the RRM that are important for binding a multitude of RNA sequences and proteins. Common structural aspects were extracted that allowed us to define a structural leitmotif of the RRM–nucleic acid interface with its variations. Outside of the two conserved RNP motifs that lie in the center of the RRM β-sheet, the two external β-strands, the loops, the C- and N-termini, or even a second RRM domain allow high RNA-binding affinity and specific recognition. Protein–RRM interactions that have been found in several structures reinforce the notion of an extreme structural versatility of this domain supporting the numerous biological functions of the RRM-containing proteins.

Abbreviations
ACF

APOBEC-1 complementary factor

CBP

cap binding protein

CstF

cleavage stimulation factor

hnRNP

heterogeneous nuclear ribonucleoprotein

HuD

Hu protein D

LRR

leucine rich repeat

MIF4G

middle domain of the translation initiation factor 4 G

PABP

polyadenylate binding protein

PIE

polyadenylation inhibition element

PTB

polypyrimidine tract binding protein

RBD

RNA-binding domain

RNP

ribonucleoprotein

RRM

RNA recognition motif

SR

serine/arginine rich proteins

TLS

translocated in liposarcoma

U1A, U2A′, U2B′

U1 snRNP proteins A, A′, B′

U2AF

U2 snRNP auxiliary factor

UHM

U2AF homology motif

UPF

up-frameshift protein

History – what defines an RRM?

The RNA recognition motif (RRM), also known as the RNA-binding domain (RBD) or ribonucleoprotein domain (RNP), was first identified in the late 1980s when it was demonstrated that mRNA precursors (pre-mRNA) and heterogeneous nuclear RNAs (hnRNAs) are always found in complex with proteins (reviewed in [1]). Biochemical characterizations of the mRNA polyadenylate binding protein (PABP) and the hnRNP protein C shed light on a consensus RNA-binding domain of approximately 90 amino acids containing a central sequence of eight conserved residues that are mainly aromatic and positively charged [2,3]. This sequence, termed the RNP consensus sequence, was thought to be involved in RNA interaction and was defined as Lys/Arg-Gly-Phe/Tyr-Gly/Ala-Phe/Tyr-Val/Ile/Leu-X-Phe/Tyr, where X can be any amino acid. Later, a second consensus sequence less conserved than the previously characterized one [1] was identified. This six residue sequence located at the N-terminus of the domain was defined as Ile/Val/Leu-Phe/Tyr-Ile/Val/Leu-X-Asn-Leu. The first consensus sequence was therefore referred as RNP 1 and the second as RNP 2 (Fig. 1). It was then shown that this protein domain was necessary and sufficient for binding RNA molecules with a wide range of specificities and affinities (reviewed in [4–6]).

Figure 1.

Sequence alignment of a selection of RRM domains for which the structure has been solved (PDB codes are indicated in brackets). The alignment was generated by the program clustalw (http://www.ebi.ac.uk/clustalw/) [55] and manually optimized. The conserved RNP 1 and RNP 2 sequences are displayed in yellow. The amino acids highlighted in boxes refer to the aromatic residues important for primary RNA binding.

Here we review the structural properties of the RRM domain in its isolated form and in complex with RNAs and/or proteins. This review shows how such a simple domain can modulate its fold to recognize many RNAs and proteins in order to achieve a multitude of biological functions often associated with post-transcriptional gene regulation.

An abundant and ancient fold with multiple biological functions

Genome sequencing projects recently showed that the RRM is found abundantly in all life kingdoms, including prokaryotes and viruses although at lower abundance than in eukaryotes. To date, only 85 proteins containing an RRM domain in bacteria (mostly cyanobacteria [7]), and six such proteins in viruses have been identified. Prokaryotic RRM proteins are rather small (about 100 amino acids) and have a single copy of the RRM domain. In eukaryotes, the RNA recognition motif is one of the most abundant protein domains. To date, a total of 6056 RRM motifs have been identified in 3541 different proteins (http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00076) [8]. In humans, 497 proteins containing at least one RRM have been identified. Assuming about 20 000–25 000 human genes, the RRM would therefore be present in about 2% of gene products. In eukaryotic proteins, RRMs are often found as multiple copies within a protein (44%, two to six RRMs) and/or together with other domains (21%). Among the latter, the most abundant are the zinc fingers of the CCCH and CCHC type (21% of those with an additional domain), the polyadenylate binding protein C-terminal domain (PABP or PABC, 10%), and the WW domain (9%). Interestingly, contrary to the well known CCHHs that bind double-stranded DNA or RNA, the CCCH and CCHC zinc fingers are domains that bind single-stranded RNA [9,10]. The PABP and the WW domains [11] are protein–protein interaction domains involved in translation [12,13] and pre-spliceosome formation, respectively [14]. By association with different types of protein domains, the RRM domain can modulate its RNA-binding affinity and specificity and diversify its biological functions.

A protein domain in such abundance is necessarily biologically important and associated with many functions in the cell. Indeed, eukaryotic RRM proteins are present in all post-transcriptional events: pre-mRNA processing (for example CstF-64, LA, or UPF3 proteins), splicing (U2B′, U2AF35, U2AF65, hnRNPA1 or Y14 proteins), alternative splicing (hnRNPA1, PTB, sex-lethal, SR proteins), mRNA stability (CBP20, PABP or HuD), RNA editing (ACF), mRNA export (TLS), pre-rRNA complex formation (nucleolin), translation regulation (PABP) and degradation [6]. In plants, RRM proteins are present in chloroplasts and are involved in 3′ end processing of chloroplast mRNA [15]. They have also been discovered in plant mitochondria. Their functions, however, remain unclear [16]. Similarly, their roles in bacteria and viruses are still unknown. The numerous three-dimensional structures of the RRM in isolation, and in complex with RNA or other proteins, shed light on the function of RRM proteins, as shown below.

The structure of the RRM, a βαββαβ fold with some variations and extensions

The RRM folds into an αβ sandwich structure with a β1α1β2β3α2β4 topology (Figs 1 and 2) as demonstrated by the first structure of an RNA recognition motif, the N-terminal RRM of U1A [17]. The fold is composed of one four-stranded antiparallel β-sheet spacially arranged in the order β4β1β3β2 from left to right when facing the sheet (Fig. 2, hnRNP A1-RRM 2, front view) and two α-helices (α1 and α2) packed against the β-sheet. Most of the conserved residues of the RRM are in the hydrophobic core of the domain [17] except four conserved residues that contribute to RNA binding, namely RNP 1 positions 1, 3 and 5 and RNP 2 position 2 (see the following section and Fig. 1). The RNP 1 and RNP 2 motifs are located in the central strands of the β-sheet, namely β3 and β1, respectively, and are highly conserved apart from a few RRM domains such as ALY and TAP (Fig. 1) [18,19].

Figure 2.

hnRNPA1 RRM 2, a typical RRM fold and its structural variations as illustrated by these different protein structures (hnRNPA1 RRM 2 [52], PTB RRM 3 [23], La C-terminal [20], Cst64 RRM [22] and U2AF35 [51]). This figure was generated with the program molmol[56].

To date, more than 30 RRM structures have been determined either by NMR or X-ray crystallography and reveal unexpected variations as shown in Fig. 2. The loops between the secondary structure elements (loops 1–5 as indicated in Figs 1 and 2) can have different lengths and are often disordered in the free form. An exception to this is loop 5 that often forms a small two-stranded β-sheet (β3′ and β3′) (Fig. 2). The N- and C-terminal regions, outside the RRM, are usually poorly ordered in the isolated domains with a few exceptions where they can adopt a secondary structure (Fig. 2, PTB-RRM 3, La C-terminal RRM and CstF-64). In the structures of La C-terminal RRM [20], U1A N-terminal RRM [21] and CstF-64 RRM [22], the C-terminus forms an α-helix that lies on the β-sheet surface, while in PTB-RRM 2 and 3 it extends the size of the β-sheet by forming an extra β-strand (β5) antiparallel to β2[23,24]. CstF-64 RRM has also an additional short α-helix in its N-terminal region (Fig. 2) [22]. Finally, secondary structure elements of the domain can be modified; for example α-helix 1 in U2AF35 RRM that is three times longer than in a canonical RRM (Fig. 2). This unusual helix 1 is involved in protein–protein interactions [25] (see the RRM–protein complexes section).

A true single-stranded nucleic acid binding domain

Since the first structure of an RRM in complex with RNA (the N-terminal domain of U1A in complex with U1snRNA stem-loop II [26]) that founded our understanding of RRM–RNA recognition, 10 structures of RRMs in complex with RNA or DNA (for hnRNPA1) have been determined either by NMR [27–30] or X-ray crystallography [31–36]. All of the structures present intrinsic common features and differences in RNA recognition reflecting the remarkable adaptability of this domain in order to achieve high affinity and specificity.

Systematic visual analysis of the conserved residues at the RRM–RNA interface for all 11 published complexes led us to define a common structural archetype of the RRM–nucleic acid interaction exemplified by hnRNPA1, an RRM protein binding both DNA and RNA with high affinity. In the structure of hnRNPA1 RRM 2 in complex with DNA [34](Fig. 3A), two deoxynucleotides, A209 and G210, stack two aromatic rings located on β1 (Phe108, RNP 2 position 2) and β3 (Phe150, RNP 1 position 5) strands, respectively (Fig. 3A). The contacts with these two RNP positions result in a characteristic arrangement of the nucleic acid strand on the β-sheet surface in which the 5′ end is located on the first half of the β-sheet (β4β1) and the 3′ end on the second half (β3β2) (Fig. 3B). A third aromatic residue located on β3 (Phe148, RNP 1 position 3) interacts hydrophobically with the sugar rings of A209 and G210. Finally, a positively charged side chain (Arg146, RNP 1 position 1) forms a salt bridge with the phosphate between A209 and G210. This small set of RRM–nucleic acid interactions, in the center of the domain, involving four conserved protein side chains of the RRM consensus sequence and two nucleotides, illustrates the perfect adaptation of the RRM for effectively binding single-stranded nucleic acids of any sequence. Indeed, the essential chemical elements of this dinucleotide, namely the two bases, the two sugar rings and the phosphates in between, are recognized. The two bases are stacked on conserved aromatic rings, and correspondingly, RNP 2 position 5 and RNP 1 position 2 are planar residues (Phe, Tyr, His or Trp) in 78% and 72% of the 70 RRMs studied by Birney et al. [6], respectively. The two sugar rings are in contact with a hydrophobic side chain (RNP 1 position 3) that is present in 81% (67% of Phe or Tyr) of the RRMs and finally the negatively charged phosphodiester group is neutralized by a positively charged side chain (RNP 1 position 1) present in 68% of the RRMs [6]. Although the residue conservation at these four positions is strong, these four characteristic contacts are not always found all together [34]. Among the RRM–RNA/DNA complexes, the two RRMs of hnRNPA1 in complex with DNA have all four characteristic contacts, whereas only one to three of those are found in the other structures (Fig. 4). The most frequent ones are the two stacking interactions involving RNP 2 position 2 (always present except in nucleolin RRM 2 [37]) and RNP 1 position 5 (always present except in CBP 20 [36]). The contacts between the sugars and RNP 1 position 3 are present in five RRM–RNA complexes (CBP20, PABP RRM 1, nucleolin RRM 1 and RRM 2 and sex-lethal RRM 1). The RNP 1 position 1 residue does not necessarily interact with the phosphate between the dinucleotide because in all structures apart from hnRNPA1 it contacts an RNA base or a phosphate oxygen of other nucleotides. Also, the RRM interactions with the sugar–phosphate backbone are fairly limited compared to other types of RNA-binding proteins, such as ribosomal proteins, suggesting a less important role for this type of interaction [38].

Figure 3.

hnRNPA1 RRM 2 as a model of single stranded nucleic acid binding [25]. (A) Structure of hnRNPA1 RRM 2 in complex with single stranded telomeric DNA and scheme of the β-sheet annotated with the conserved RNP 1 and RNP 2 aromatic residue positions numbered according to each RNP sequence numbering. The conserved aromatic residues are highlighted by green circles [34]. (B) Structural arrangement of the DNA strand on the β-sheet of hnRNPA1–RRM 2. (C) Hydrogen bond and van der Waals interaction network conferring base-binding specificity (hnRNPA1–RRM 2 complex). This figure was generated with the program molmol[56].

Figure 4.

The RRM domain, a highly plastic platform for nucleic acid binding. (A) Nucleolin RRM 2-sNRE complex [28]. (B) Sex-lethal RRM 1–polyU–Tra mRNA [31]. (C) Sex-lethal RRM 2–Tra mRNA precursor complex [31]. (D) hnRNPA1 RRM 1–telomeric DNA complex [34]. (E) Poly(A)-binding protein RRM 1–polyadenylate RNA complex [33]. (F) Heterodimeric nuclear cap binding complex 5′ capped polymerase II transcripts [36]. In all figures, the RNA is shown in yellow and the protein side chain in green. The ribbon of the RRM is shown in grey. The N- and C-terminal extensions of the RRM are shown in green and red, respectively. This figure was generated with the program molmol[56].

This basic binding platform common to all RRMs is not in essence sequence-specific as eight of the 16 dinucleotide combinations have already been found: AA [33], AG [34], CG [28], CA [26], GU [31], UC [28], UG (S. D. Auweter and F. H.-T. Allain, unpublished data) and UU [31], with any type of nucleotide either at the 5′ or the 3′ position. The nucleotides at these two positions always adopt an anti conformation, except for the G at the 3′ position always found in a syn conformation. Specificity of this central dinucleotide recognition is provided by other non conserved elements of the RRMs. The two most frequently observed elements are the protein side chains at the surface of the β-sheet (RNP 1 position 7 and the two adjacent positions in β1) (Fig. 3A) and the backbone and side chains of the few amino acids just C-terminal to β4. These residues are base-specifically hydrogen-bonded to the RNA or DNA functional groups as illustrated by the multiple base–amino acid contacts in hnRNPA1 RRM 2 (Fig. 3C).

A highly plastic domain to achieve high RNA-binding affinity and specificity

Many RRMs bind RNA with high affinity (in the nm range) and high sequence-specificity, in particular all those whose structures have been determined to date. Nevertheless, sequence-specificity does not necessarily imply high affinity, e.g. PTB that specifically recognizes pyrimidine tracts but does not provide sufficient binding enthalpy to reach nm affinity (F. C. Oberstrass, S. D. Auweter and F. H.-T. Allain, unpublished data). To achieve higher affinity, some RRM proteins use the two external β4 and β2 strands, while others use the loops 1, 3 or 5, or the C- and N- termini [39]. In many proteins, multiple RRMs associate to bind longer nucleotide stretches. In these cases, the interdomain linker is an essential component of RNA recognition. In addition, the RNA secondary structure can be an important determinant of the protein binding affinity. All of these aspects are presented in detail below.

Role of the two external β-strands and the loops

The β-sheet surface of an RRM can be modulated by using only one or up to four β-strands for RNA binding. Figure 4 clearly illustrates that the β-sheet surface is not used to the same extent in each RRM–nucleic acid complex. Exceptionally, in hnRNPA1 RRM 1, each β-strand binds one nucleotide, the DNA being spread on the β-sheet from β4 to β2 in the 5′−3′ direction. More often, the nucleotide at the 5′ end of the central dinucleotide contacts the loops at the bottom of the β-sheet (loop 1 and loop 3 in particular, Fig. 4C) and the one at the 3′ end stacks over the previous nucleotide (Fig. 4A). In PAPB RRM 1, it is different again; while A6 and A8 stack the protein side chains at the canonical positions on β1 and β3, respectively, the nucleotide in between, A7, interacts with loop 3 (Fig. 4E).

Role of the N- and C-terminal regions

The N- and C-terminal regions of the RRM are often of crucial importance to dramatically enhance the RNA-binding affinity by increasing the protein–RNA interaction network. In most RRM–RNA complexes, the base stacking on the aromatic residue at RNP 2 position 2 is sandwiched either by a protein side chain from the N-terminal region (CBP20) or by one from the C-terminal region of the RRM (Fig. 4D–F) [36]. This side chain can be one residue after the end of β4 as in U1A [26,27] or 16 residues afterwards as in hnRNPA1 RRM 1 [34] (Fig. 4D). The C-terminus of hnRNPA1 RRM 1 is particularly interesting because it is unstructured in the free form and becomes ordered upon DNA binding forming a 310 helix. This structural rearrangement reinforces the concept of binding by induced fit, initially proposed with the structure of the U1A–RNA complex [27]. Side chain residues of this helix, His101 and Arg92, stack over A203 and G204, respectively (Fig. 4D) [34].

The C-terminus can also contribute to differentiating RNA from DNA by interacting with the 2′OH group of the sugar ring as shown in Fig. 4B,E. The hydroxyl group can act as a hydrogen bond acceptor interacting with protein side chains (Fig. 4E, Arg94; Fig. 4B Arg202) as well as with the backbone amide (Fig. 4B, Gly205) and/or as a hydrogen bond donor interacting with the carbonyl oxygen of the protein backbone [38]. Other parts of the RRM domain, such as the β2-strand and the loops, also interact with the 2′OHs and help to discriminate RNA from DNA [26,31,33,35].

The C-terminal region does not always enhance, but can also inhibit RNA binding as shown in the structure of CBP20 [36] (Fig. 4F). Two residues (Asn116 and Arg123) of the C-terminus form a salt bridge located above the RNP 1 residue at position 5 (Phe85) preventing any RNA binding at this key position. Similarly in PTB, the C-terminal region of all the RRMs hydrophobically interacts with RNP 1 position 5, thereby masking this binding site (F. C. Oberstrass, S. D. Auweter and F. H.-T. Allain, unpublished data).

Role of the RNA secondary structure in RRM binding

Some proteins such as the N-terminal RRM of U1A bind single-stranded RNA with high affinity only if the RNA is embedded within a secondary structure, stem loop (hairpin loop II of U1 snRNA [26]) or internal loop (the regulatory element of the U1A 3′ untranslated region [27]). For example, the U1A protein that recognizes a stem loop has a much weaker affinity (104-fold) for a single-stranded 23-mer RNA with no base pairs, even though the proper single-stranded recognition sequence is present [26]. U1A RRM 1 specifically recognizes the secondary structure of the target RNA through its loops 1 and 3 binding to a specific base pair. In the case of U1A bound to a fragment of U1 snRNA hairpin II, Arg52 (loop 3) makes crucial interactions with the closing loop GC base pair and its substitution to Glu completely abolishes RNA binding [26](Fig. 5A). U1A not only binds a stem loop but also an internal loop [27,29]. This ability to bind RNA in different environments shows the adaptability of the proteins to recognize different secondary structures as long as the key protein–RNA interactions are conserved. The closely related U2B′ RRM binds the same hexanucleotide sequence, AUUGCA, as U1A but within a different stem loop (U2 snRNA hairpin IV) and only when in complex with U2A′ (Fig. 5B). The adaptability of the RRM domain is further illustrated here, as the key residue Arg52 still interacts with the RNA stem although the closing base pair is a UU base pair in U2snRNA SLIV instead of a GC in U1snRNA SLII.

Figure 5.

Role of the RNA secondary structure in RRM binding. (A) U1A spliceosomal protein–U1 snRNA hairpin II complex [26]. (B) U2B′–U2A′ protein complex bound to U2 snRNA hairpin IV [32]. (C) Nucleolin–sNRE complex [28]. The loop E motif is composed of a sheared base G5-A18 pair, an A6-U17-G16 and a symmetric (trans-Hoogsteen) locally parallel A7-A15 base pair. (D) Nucleolin–b2NRE complex with the loop E motif substituted by a bulge (U15 between two GC base pairs) [30]. The color schemes are the same as in Fig. 4, except that the proteins loops and the C-terminus are shown in blue. This figure was generated with the program molmol[56].

While both U1A and U2B′ recognize the bases at the top of the stem through numerous hydrogen bonds, nucleolin contacts the nucleolin recognition element (sNRE) RNA stem essentially by van der Waals interactions [28] (Fig. 5C). The two RRMs of nucleolin sandwich the seven nucleotide loop and RRM 1 and its C-terminal part recognize the unusual loop E structure [28]. The substitution of the loop E by two GC base pairs separated by a bulge increases the dissociation constant more than 100-fold (from 5 nm to 0.8 µm) [30] and, as shown in Fig. 5D, this substitution annihilates all van der Waals interactions (only one hydrogen bond from Lys95 is retained). The double-stranded stem is important for two reasons: first, it restricts the conformation of the RNA loop and reduces the entropy loss accompanying protein binding; and second, some structural features of the RNA such as the base pair (U1A and U2B′) or loop E (nucleolin) that closes the RNA loop, are crucial for positioning the RRM onto the RNA. It was postulated that the RNA structure is essential because it induces conformational changes in order to reach the bound state [27,40].

Role of additional RRMs

The combination of two or more RRM domains allows the continuous recognition of a long nucleotide sequence (8–10 nucleotides) often drastically increasing the affinity (Kd < nm). As shown previously, the β-sheet surface can bind up to four nucleotides and up to six if loops 1 and 3 contribute extensively to binding (S. D. Auweter and F. H.-T. Allain, unpublished data). Thus, recognition of a longer single-stranded DNA or RNA requires more than one RRM to form a larger binding platform. Four structures of two consecutive RRMs in complex with RNA (sex-lethal [31], HuD [35], PABP [33] and nucleolin [28,30]) and one with DNA (hnRNPA1 [34]) have been determined. In all five cases, the two RRMs and the interdomain linker cooperatively bind RNA providing high affinity and specificity. In the free forms of sex-lethal and nucleolin, the linkers are disordered and the two RRM domains tumble independently [37,41]. In some cases (PABP, nucleolin), the interdomain linker (that is the C-terminal region of the N-terminal RRM as described above) acts as a bridge, mediating the cooperative binding of two RRM domains with the RNA. More interesting is the range of new possible conformations provided by the association of two RRMs (Fig. 6). In PAPB, a large binding platform is created for the RNA; in sex-lethal and HuD, the two RRMs form a cleft in which the RNA lies; and in nucleolin the RNA is sandwiched between the RRMs. As a consequence of the relative arrangement of the two domains in sex-lethal, HuD and nucleolin, several intra-RNA interactions are created upon RNA binding that contribute to the overall enthalpy of the complex, while in PABP almost no intra-RNA interactions are present. On the contrary, hnRNPA1 RRMs 1–2 and PTB RRMs 3–4 (F. C. Oberstrass, S. D. Auweter and F. H.-T. Allain, unpublished results) are arranged in such a way that only distantly located RNA sequences of the same RNA can bind simultaneously to both RRMs. These totally opposite topologies might reflect the opposite function of the various RRM proteins, as both sex-lethal and HuD are splicing activators, while hnRNPA1 and PTB are splicing repressors [42].

Figure 6.

The RRM–RRM interactions. Several protein structures either free or in a complex in which two RRM domains interact are shown. Structures of (A) UP1 in the free form [53] (pdb:1 µp1), (B) nucleolin in complex with RNA [28] (pdb:1fje), (C) sex-lethal in complex with RNA [31] (pdb:1b7f), (D) PABP in complex with RNA [33] (pdb:1cvj), and (E) U1A homodimer in complex with RNA [29] (pdb:1dz5). The RNA backbone is shown in yellow (A–E), the N-terminal RRM domain is displayed green, C-terminal domain blue, and linker region red. (F) One monomer of U1A is displayed green and the other blue. In all cases, important residues for the protein–protein interaction are displayed as balls and sticks. This figure was generated using the programs molscript and raster3d[57,58].

The RRM, also a protein–protein interaction domain

Over the last few years, biochemical and structural studies have shown that the RRM is not only involved in RNA recognition but also in protein–protein interaction. In addition to structures of multiple RRM-containing proteins as described in the previous section, structures of RRM domains in complex with various proteins or domains have been solved [32,43–51]. Analysis of these structures shows that protein recognition by RRM domains is very diverse with no general mechanism emerging. For clarity, we distinguish three main classes of RRM–protein interactions: between two RRMs, between an RRM-binding RNA and a non-RRM protein, and finally between RRMs that do not bind RNA and another protein.

Protein interaction involving two RRM domains

The first structure showing an interaction between two RRMs is the N-terminal region of hnRNPA1 (UP1) in its free form that contains two RRM domains separated by a short linker [52,53]. The two RRMs form a compact fold and interact with each other via their α-helix 2. The interaction is stabilized by two salt bridges connecting two arginines of the first RRM and two aspartic acids of the second (Fig. 6A). This arrangement positions adjacently the β-sheets of both domains forming an extended surface of eight β-strands. Similarly, PTB RRMs 3 and 4, separated by a 24 residue linker region, do not tumble independently in the free form (F. C. Oberstrass, S. D. Auweter and F. H.-T. Allain, unpublished data).

These RRM–RRM interactions are not a general feature of all RRM proteins. In the case of sex-lethal and nucleolin, in the free proteins, the linker is flexible and the two RRM domains are independent [28,41]. However, upon RNA binding, the two RRM domains adopt a fixed orientation and contact each other. In the nucleolin structure, the RRMs interact via two salt bridges located in the loops (Fig. 6B) and in the structure of hnRNPA1, the RRMs interact by salt bridges located in the α2-helix. Other examples of RNA inducing RRM–RRM interactions have also been described in the case of sex-lethal [31], PABP [33], and HuD [35]. In sex-lethal and HuD, the interdomain interaction is mainly governed by two hydrogen bonds between residues located in β1 and β4 of RRM 1 and in β2 of RRM 2 (Fig. 6C). Furthermore, additional contacts between RRM 2 and the linker region are observed. In the case of PABP, the interdomain interactions are mediated through many salt bridges and van der Waals contacts between α2 and β4 of RRM 1 and β2 and α1 of RRM 2, respectively (Fig. 6D).

Another interesting example of RRM–RRM interaction is found in the structure of the N-terminal RRM domain of the U1A protein in complex with the polyadenylation inhibition element (PIE) RNA [29]. In this case, two U1A proteins bind cooperatively to the PIE RNA [54]. The structure shows that when bound to RNA, U1A RRM 1 forms a homodimer stabilized by interactions between the two α-helical C-termini (Fig. 6E). On one side the C-terminal α-helix contains charged residues that interact with the RNA and on the opposite side contains hydrophobic residues that constitute the dimer interface.

All of these structures clearly show that RRM domains can be involved in RRM–RRM interaction in addition to RNA binding. In most of these complexes, these additional interactions contribute to the formation of a larger RNA-binding interface and are therefore critical to reach high RNA-binding affinity and specificity. This feature is likely to be frequently found in multiple RRM-containing proteins, especially if the interdomain linker is short.

Protein interaction involving one RRM domain and another domain

In some cases, it has been demonstrated that RRM-containing proteins can associate with RNA only in the presence of another protein that acts as a cofactor. Both U2B′ and CBP20 need a cofactor, U2A′ and CBP80, respectively, to recognize RNA. Ternary structures of these complexes have been solved that partially explain the importance of a cofactor in RNA–RRM binding [32,43–45]. U2A′ consists of five consecutive leucine-rich repeats, and CBP80 of three helical hairpin repeats very similar to the fold of the middle domain of the translation initiation factor 4G (MIF4G) domain. In both cases, the RRM domains of U2B′ and CBP20 interact with the leucine rich repeat (LRR) motif or the MIF4G domain through their α-helices and loop 4, keeping the β-sheet accessible for RNA-binding (Fig. 7). The interactions, however, are different as they are governed mainly by hydrophobic contacts in the U2B′–U2A′ complex, and salt bridges and hydrogen bonds in the CBP20–CBP80 complex. Furthermore, in the case of CBP20, the N- and C-terminal extensions flanking the RRM domain become structured only when in complex with both RNA and CBP80. As for RRM–RRM interactions, these RRM–protein interactions contribute to RNA-binding specificity, U2A′ contacting the RNA and CBP80 stabilizing both the N- and C-termini of CBP20 RRM, two key components of CBP20–RNA recognition (Fig. 4) [44].

Figure 7.

The RRM–protein–RNA trimolecular complexes. (A) The U2B′–U2A′–RNA ternary complex [32]. (B) The CBP20–CBP80–RNA complex [36]. The RNA is shown in yellow, the RRM domain in green, and leucine-rich repeats or MIF4G domains in blue. Residues important for the interaction are displayed as balls and sticks. This figure was generated using the programs molscript and raster3d[57,58].

RRM domains involved only in protein recognition

Some proteins containing RRM domains are involved in protein–protein but not in protein–RNA interactions. Recently, three-dimensional structures of such proteins in complex partially explained this unexpected behavior of the RRM domain. Two different situations, however, have been reported. In one case, the protein interaction involves the β-sheet of the RRM domain, thus preventing RNA binding as in the Y14–Magoh complex [46–49] or the UPF2–UPF3 complex [50]. In a second case, the interaction is mediated through the α-helices, leaving the β-sheet solvent-exposed and therefore theoretically able to bind RNA, as with the U2AF35–U2AF65[51], and the U2AF65–SF1 complexes [46]. In this latter case, it was postulated that the particular behavior of these RRM domains is due mainly to the identity of the amino acids on the surface of the β-sheet (see below [25]).

Y14 and Magoh proteins are part of the exon junction complex that comprises several proteins. Y14 and Magoh form a highly stable complex with nanomolar binding affinity [48]. The C-terminal domain of Y14 has a typical RRM fold and the RNP 1 and RNP 2 amino acid sequences of Y14 are very similar to other RRM domains (Fig. 1). However, Y14 does not bind RNA. Structures of the Y14–Magoh heterodimer show that Y14 binds Magoh through its entire β-sheet [46–48](Fig. 8). This particular complex formation of the RRM neatly explains why some RRM domains do not have RNA-binding activities. Similarly, in the structure of the UPF2–UPF3 complex involved in non-sense mediated mRNA decay, the β-sheet of the N-terminal RRM domain of UPF3 binds UPF2 [50]. Although the two RRM proteins both interact through their β-sheet, their interacting proteins, Magoh and UPF2, adopt a completely different fold. UPF2 has a totally α-helical MIF4G fold very similar to CBP80, while Magoh has an αβ fold (Fig. 8). Also striking is the fact that both UPF2 and CBP80 adopt a MIF4G fold, but recognize RRM in a totally different manner, UPF2 recognizing the RRM β-sheet and CBP80 the RRM α-helices.

Figure 8.

The Y14–Magoh complex [48]. Y14 is shown in green, and Magoh is shown in blue. The RNP 1 and 2 of Y14 are shown in red. This figure was generated using the programs molscript and raster3d[57,58].

The structures of the splicing factors U2AF35–U2AF65 and U2AF65–SF1 are another example of the diversity encountered in protein–RRM recognition. U2AF65 contains three RRM domains, the two N-terminal domains binding RNA while the C-terminal domain mediates SF1 interaction. U2AF35 contains a central RRM domain flanked by two zinc finger domains. The structures of U2AF35 RRM in complex with the N-terminal domain of U2AF65 and of the RRM of U2AF65 in complex with the N-terminal domain of SF1 have been solved [46,51]. Surprisingly, in this case, the β-sheet of the RRM domain is not implicated in protein interaction as for other non-RNA-binding RRM domains, but involves the two α-helices. Analysis of the RRM fold in these two structures shows striking differences from the canonical RRM domains, mainly consisting of a longer helix α1 (Fig. 2) and the absence of aromatic residues in the RNP 1 and 2 motifs. The authors therefore proposed a novel class of protein recognition motif that they named U2AF homology motif (UHM) [25].

The examples described above define a novel class of RRM domains that are involved in protein but not RNA interactions, suggesting that RRM domains might have evolved from RNA to protein recognition. Although these RRM proteins do not bind RNA, they are all implicated in RNA-related functions such as recognition of the exon junction (Y14), mRNA decay (UPF3) or pre-mRNA splicing (U2AF35 and U2AF65). This evolutionary process can be accompanied by amino acid substitutions in the RNA-binding regions, namely RNP 1 and 2, as proposed for the UHM domain. However, in the case of Y14 and UPF3, it is not entirely clear why these RRM domains that are very similar to the classical ones favor interaction with proteins rather than RNA.

Conclusion and perspectives

The RNA recognition motif is an abundant and very diverse protein motif found mainly in eukaryotes. Analysis of the structures of this domain in the free form as well as in complex with both RNA and proteins shows that this small domain is extremely diverse in terms of both structure and function. We are now just starting to understand the structural, functional, as well as evolutionary aspects of this domain. It is now clear that the original perception of the RRM as a simple rigid RNA-binding domain must evolve and that further biochemical and structural studies are needed to obtain a full picture of its role in the cell. Structures of RRM domains in complex with different RNAs show that this small compact domain is a central component of RNA recognition but not the only determinant. N- and C-terminal extensions, multiplication of RRM domains or protein cofactors can play an important role in RNA-binding specificity. This review also raises many questions concerning this domain. First, concerning RNA binding, analysis of the different structures shows that although some conserved aromatic residues are always found at the interface, the topology of the bound RNA is quite different in each complex and the sequence-specificity cannot easily be predicted. Thus, more structures of RRM–RNA complexes are needed to fully understand the determinants of this specificity. Second, RRM domains are able to bind RNA with affinities ranging from very high to weak, and the structural and thermodynamic determinants of the RNA-binding affinity still need to be elucidated. Third, as it is now demonstrated that some RRM domains are specific to protein recognition rather than RNA binding, which of the identified RRM domains are true RNA-binding domains and which ones are not? In some cases, the primary sequence can differentiate between these behaviors, as for the novel UHM domain, but in other cases, such as Y14 and UPF2, structural determinants other than the amino acid sequence must be present but are still unknown and need to be identified. Fourth, it is established that a high number of proteins contain both RRM and auxiliary domains, such as zinc fingers, also involved in nucleic acid binding. No structural studies, however, indicate if these two RNA-binding domains within the same protein influence each other for RNA binding. Finally, it has recently been discovered that the RRM domain, for a long time thought to belong exclusively to the eukaryotic world, is also present in bacteria, viruses and mitochondria. From an evolutionary point of view, it would be very interesting to investigate the function of this domain in such organisms and maybe discover their common ancestor. In conclusion, further structural investigations on RRM domains possibly coupled with thermodynamic and kinetic studies are still needed to confirm present hypotheses and possibly to reveal more surprises.

Acknowledgements

The authors would like to acknowledge the financial support of the Fondation Schlumberger pour l'Education et la Recherche (postdoctoral fellowship), the Swiss National Science Foundation (Nr. 31–67098.01), the Roche Research Fund for Biology at the ETH Zurich and the SNF NCCR structural biology to FHTA.

Ancillary