SEARCH

SEARCH BY CITATION

Keywords:

  • DNA;
  • dynamics;
  • endonuclease;
  • interaction;
  • nucleosome;
  • protein;
  • ribosome;
  • RNA;
  • structure;
  • transient

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Replication and transcription regulation – proteins in search of their sites on the nucleic acids
  5. Interactions lost in translation
  6. The transient positioning of the nucleosome along the genome
  7. Engineering and design of protein-nucleic acid interactions – lessons from endonucleases
  8. Challenges and the way ahead
  9. Acknowledgements
  10. References

The great pace of biomolecular structure determination has provided a plethora of protein structures, but not as many structures of nucleic acids or of their complexes with proteins. The recognition of DNA and RNA molecules by proteins may produce large and relatively stable assemblies (such as the ribosome) or transient complexes (such as DNA clamps sliding through the DNA). These transient interactions are most difficult to characterize, but even in ‘stable’ complexes captured in crystal structures, the dynamics of the whole or part of the assembly pose great technical difficulties in understanding their function. The development and refinement of powerful experimental and computational tools have made it possible to learn a great deal about the relevance of these fleeting events for numerous biological processes. We discuss here the most recent findings and the challenges that lie ahead in the quest for a better understanding of protein–nucleic acid interactions.


Abbreviations
PCNA

proliferating cell nuclear antigen

RCC1

regulator of chromosome condensation 1

Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Replication and transcription regulation – proteins in search of their sites on the nucleic acids
  5. Interactions lost in translation
  6. The transient positioning of the nucleosome along the genome
  7. Engineering and design of protein-nucleic acid interactions – lessons from endonucleases
  8. Challenges and the way ahead
  9. Acknowledgements
  10. References

Protein–nucleic acids interactions and structural genomics

In their celebrated reports on the double helix, Watson and Crick showed the strong structure–function relationship in the DNA molecule. This relationship is more intricate in the case of RNA, because of its larger structural and functional diversity, and even more so when we consider protein·nucleic acid complexes, given the much larger diversity found in proteins.

Rapid genome-sequencing methods, large-scale gene expression analysis and high-throughput structural genomics projects have greatly augmented the number of known biomacromolecular structures. Currently, about 72 000 structures are deposited in the Protein Data Bank, but only 3% are nucleic acids and about 4% are protein·nucleic acid complexes. It is difficult to know whether these figures mirror the prevalence of proteins and their complexes in the cell, or whether they arise from the greater difficulties in the identification and experimental determination of protein·nucleic acid complexes. Structural genomics initiatives target the low-hanging fruits, small globular proteins that can be easily expressed as soluble material in heterologous systems. An analogous endeavor for RNA molecules has not yet been initiated, very probably because of the experimental difficulties [1]. Indeed, preparing large amounts of homogeneous RNA for crystallization is not trivial. In addition, RNA samples need careful manipulation, are notably difficult to crystallize, give poor contrast in cryo-electron microscopy, and suffer from severe signal overlap in NMR spectra. Protein–protein interactions may also be transient and difficult to capture experimentally, but they have been intensively targeted on a large scale by means of complementary methods, such as yeast two-hybrid and tandem affinity purification followed by MS. Still, in a systematic exploration of protein complexes in the yeast interactome by tandem affinity purification followed by MS [2], the protein proliferating cell nuclear antigen (PCNA) was not detected in any of the 589 purified complexes, despite being a very promiscuous protein and an essential component of the replication machinery [3,4].

Replication and transcription regulation – proteins in search of their sites on the nucleic acids

  1. Top of page
  2. Abstract
  3. Introduction
  4. Replication and transcription regulation – proteins in search of their sites on the nucleic acids
  5. Interactions lost in translation
  6. The transient positioning of the nucleosome along the genome
  7. Engineering and design of protein-nucleic acid interactions – lessons from endonucleases
  8. Challenges and the way ahead
  9. Acknowledgements
  10. References

PCNA belongs to the group of DNA sliding clamps, which are multimeric toroidal-shaped structures that encircle the DNA duplex and act as platforms for replicative polymerases and other proteins. These processivity factors enable the polymerases to add thousands of bases per second without detaching from the genomic template. The crystal structure of the homotrimeric yeast PCNA bound to a DNA duplex was recently solved [5], but only a few of the bases were seen, probably because of the transient nature of the interaction in solution (Fig. 1). The recent assignment of the NMR spectrum of the PCNA ring [6,7] provides the basis for solution studies of its interactions with DNA and other proteins [8,9]. The yeast helicase minichromosome maintenance complex forms part of the prereplicative complex, and has been recently reconstituted and loaded onto dsDNA [10]. Electron microscopy shows a barrel of two head-to-head hexamers that encircles a stretch of DNA of approximately 68 bp and passively slides along the DNA duplex.

image

Figure 1.  Structure of a PCNA·DNA complex. Two views of the homotrimeric yeast PCNA ring bound to a short DNA duplex, as deposited in the Protein Data Bank (entry 3K4X). The three polypeptide chains are shown as ribbons of different colors, and the DNA as an orange rod (backbone) and blue–green sticks (bases). The figure was prepared with pymol (http://www.pymol.org).

Download figure to PowerPoint

Sliding on the DNA may be common to proteins other than DNA clamps before they bind to their specific target sequences. Both one-dimensional diffusion of proteins on the DNA (sliding) and direct transfer between distinct binding sites (translocation) would accelerate the search process relative to the diffusion-controlled association–dissociation mechanism (Fig. 2) in the presence of a huge background of nonspecific DNA, as occurs in the nucleus of the cell, with an estimated DNA concentration of 100 g·L−1 [11]. Sliding and translocation events involve transient interactions that are difficult to observe and even more difficult to quantify. Crystal structures sometimes provide hints about these events, e.g. by the lack of electron density of DNA or protein regions, or by the observation of different conformations of amino acid side chains associated with the nucleic acid. Analysis in solution by NMR is a more powerful approach to characterize these systems [12], allowing the study of the kinetics of translocation [13,14], as well as the structures of transient, nonspecific complexes. For instance, the structure of the Lac repressor bound to a nonspecific (low-affinity) DNA sequence suggests that binding is primarily driven by electrostatics, as most of the protein–DNA interactions do not involve the bases, but the phosphates and sugars of the DNA backbone [15]. Most of these interactions are preserved in the complex with the specific sequence, but, in addition, numerous interactions with the bases take place. When the overall structures of the two complexes are compared, neither the DNA nor the protein undergo large conformational changes, but a protein segment that is disordered in solution becomes less flexible in the complex with the low-affinity DNA, and structured into an α-helix in the complex with the specific DNA. Therefore, the large local conformational landscape that the protein populates in solution is reduced upon DNA binding, and much more so when it specifically binds to its high-affinity sequence. The protein conformational landscape can be narrowed by small-molecule allosteric effectors favoring efficient DNA binding, as found for the transcriptional activator CAP [16], and the landscape of free DNA includes transient conformations whose relevance still needs to be investigated [17]. Many transcription factors bind to thousands of places in the genome, not necessarily located in proximal promoter regions, and dissociate very fast in vivo, which may be relevant for long-range and combinatorial regulation of transcription [18].

image

Figure 2.  Proteins find their binding sites in an ocean of DNA sequences. Scheme of a nucleic acid-binding protein (gray ellipses) involved in four dynamic equilibria (double arrows) and one-dimensional diffusion on the DNA double helix (single arrows). Binding to sites A, B or C occurs with different affinities (different kinetic kon and koff rates). Exchange between sites A and B (located close together in the DNA sequence) can occur through association–dissociation–reassociation or via diffusion on the DNA. Exchange between sites B and C (located far away in the DNA sequence but close enough in space to collide) can occur through association–dissociation–reassociation or via direct transfer (kdt). For the sake of clarity, some of the species participating in the individual equilibria are omitted.

Download figure to PowerPoint

Electrostatics plays a driving role in transient protein·nucleic acid interactions, as well as in selecting and stabilizing the specific ones. Indeed, it may be the most important factor in the indirect readout as opposed to the direct readout. These terms differentiate between the recognition mechanisms based on details of the DNA structure facilitating protein binding (indirect) and specific amino acid–base contacts (direct). A recent examination of protein·nucleic acid structures has shown how minor groove narrowing enhances the negative electrostatic potential of DNA and forms an arginine-binding site that is widely used in protein–nucleic acid recognition [19]. Minor groove width is primarily DNA sequence-dependent (A-tracts tend to narrow the groove, whereas GC pairs tend to widen it), although the geometry observed in a given complex is probably the result of both intrinsic and protein-induced conformation effects.

Interactions lost in translation

  1. Top of page
  2. Abstract
  3. Introduction
  4. Replication and transcription regulation – proteins in search of their sites on the nucleic acids
  5. Interactions lost in translation
  6. The transient positioning of the nucleosome along the genome
  7. Engineering and design of protein-nucleic acid interactions – lessons from endonucleases
  8. Challenges and the way ahead
  9. Acknowledgements
  10. References

The 2009 Nobel prize in chemistry for studies of the structure and function of the ribosome rewarded a long-term effort by several laboratories. The ribosome was probably the first biomacromolecular machine to appear in the early stages of life, and performs its function in essentially the same way in the three kingdoms. The ribosome translates the three-base codons of mRNAs into the amino acid sequence of the proteins encoded in the corresponding gene (Fig. 3). Because of its size (about 2.5 MDa) and lack of symmetry, it took a long period of sample preparation refinement and the use of modern diffraction instrumentation and methodology to obtain the high-resolution structure of the 70S ribosome [20]. In the process, a wealth of information has been obtained about the mechanism by which the ribosome attains its high level of accuracy in translation, its catalytic triad (rRNA, ribosomal protein, and the peptidyl-tRNA substrate), and the mode of action of many antibiotics, enabling the design of novel ones (for a brief review, see the Nobel Foundation Scientific Background published by the Royal Swedish Academy of Sciences at http://nobelprize.org/nobel_prizes/chemistry/laureates/2009/cheadv09.pdf).

image

Figure 3.  Structure of the 70S ribosome. The structure of the ribosome of Thermus thermophilus (Protein Data Bank entries 1GIX and 1GIY) is shown, with the rRNA molecules represented by thin coils, the tRNAs by spheres, the mRNA by a thick coil, and the proteins by ribbons. In the two views shown, the 50S subunit is at the top and the 30S subunit is at the bottom. The figure was obtained from Proteopedia [55].

Download figure to PowerPoint

However, translation is a dynamic process, the ribosome is a highly dynamic machine, and the crystal structures can provide only snapshots of intermediates along the process. It will be very difficult to obtain crystal structures of all representative states of the ribosome in action, but a low-resolution picture has emerged from time-resolved electron microscopy of the Escherichia coli ribosome. By unbiased hierarchical classification of 2 000 000 images, 50 structures of the ribosomal substates during translocation were refined, and the trajectories of the two tRNAs as they move through the ribosome were visualized with a resolution in the 10–20-Å range [21]. Translocation is the final step in polypeptide chain elongation, and involves the concerted movement of the tRNAs, the mRNA, and the 30S subunit relative to the 50S one. The authors used a molecular system in which retrotranslocation was the actual movement observed. After addition of tRNAfMet to ribosomes loaded with fMetVal-tRNAVal, retrotranslocation occurred on a scale of several minutes, and the samples at different time points were extracted and frozen for cryo-electron microscopy. It was found that, at physiological temperatures, no distinct 30S subunit could be outlined, as it existed in dynamic equilibrium with a large number of conformational substates. The emerging picture of the ribosome during translocation is that of a machine that couples spontaneous, thermally driven conformational changes to directed movement.

Transient interactions also occur in tRNA loading. Whereas each Xxx-tRNAXxx is loaded with the Xxx amino acid corresponding to its anticodon by a specific synthase, most bacteria and all archaeons lack glutaminyl-tRNAGln synthase. They produce Gln-tRNAGln in a two-step pathway: glutamylation of the tRNAGln (by the same low-specificity enzyme that glutamylates the tRNAGlu), and amidation by the corresponding amidotransferase. The crystal structure of the ‘glutamine transamidosome’ of Thermotoga maritima [22] shows that the anticodon-binding domains of the synthase recognizes the common features of tRNAGln and tRNAGlu (the second and third bases), whereas the so-called tail domain of the amidotransferase recognizes the outer corner of the tRNAGln (specifically for the tRNAGln). The two enzymes bound to the tRNAGln assume alternative conformations for the two consecutive reactions. The catalytic centers of the two enzymes compete for the acceptor form of tRNAGln, and therefore cannot adopt their productive forms simultaneously. Hinge polypeptide regions between the catalytic and anticodon-binding domains of the synthase, and between the catalytic and tail domains of the amidotransferase, allow both enzymes to adopt the productive or the nonproductive forms cooperating in Gln-tRNAGln synthesis, with a low probability of releasing the intermediate Glu-tRNAGln species. This ‘alternative conformation’ mechanism may be more common than expected in consecutive enzymatic reactions.

The transient positioning of the nucleosome along the genome

  1. Top of page
  2. Abstract
  3. Introduction
  4. Replication and transcription regulation – proteins in search of their sites on the nucleic acids
  5. Interactions lost in translation
  6. The transient positioning of the nucleosome along the genome
  7. Engineering and design of protein-nucleic acid interactions – lessons from endonucleases
  8. Challenges and the way ahead
  9. Acknowledgements
  10. References

The compaction of DNA molecules inside the cells occurs by supercoiling and binding to specific proteins. A supercoiled DNA duplex can form a toroid (the nucleosome, as in eukaryotes) or a plectoneme (interwound, as in bacteria). This distinction is relevant not only for DNA packing, but also for transcription, replication, and repair. The reason for this distinction is not only how accessible the DNA is, but also the twisting degree (overtwisted in the nucleosome and undertwisted in the plectoneme). However, it has been argued that a common topology for bacterial and eukaryotic DNA-based processes might exist, as the ejection of a histone octamer would convert the nucleosome into a plectoneme [23].

The crystal structure of the nucleosome core particle [24] shows a compact assembly of 147 bp wrapped around a disk formed by an octamer of histone proteins (two copies of each one of the four core histones). However, this picture is deceptively static, because, in the chromatin, the nucleosome rotational and translational positioning is not fixed. Nucleosome rotational positioning (or register) defines the orientation of the DNA helix on the histone surface, and a 10-bp periodicity is observed, reflecting a preference for sequences that face inwards or outwards with respect to the histones and optimize DNA bending. Analysis of the minor groove width along the double helix in 35 high-resolution crystal structures of nucleosomes identified a pattern of 14 minima corresponding to regions where the DNA bends and has close contacts with histone arginine side chains [19]. The analysis of DNA sequences bound in vivo by yeast nucleosomes reveals a periodicity for A-tracts three bases long, with an average of 10 A-tracts per nucleosomal DNA. Thus, even though long A-tracts tend to be excluded from the nucleosome [25], A-tracts exist and facilitate the bending of the DNA around the histone core.

Translational positioning [25] is strongly influenced by the spacing between nucleosomes, but this spacing is variable, with linker DNA regions in the range of 10–90 bp, and a given nucleosome can invade its neighbor’s territory [26]. Recent reports [27,28] indicate that sequence-dependent histone–DNA interactions have a predominant influence on the measured nucleosome occupancy (average number of histone octamer levels on a given DNA region in a population of cells) but not on nucleosome positioning (the extent to which each of the octamers of the population found in that DNA region deviates from its consensus location) [29–31]. Thus, the nucleosomal pattern in DNA coding regions observed in vivo is not determined by DNA sequence preferences for octamer binding, but primarily arises by statistical positioning from a barrier near the promoter, and this barrier involves an unknown aspect of transcriptional initiation by RNA polymerase II [28]. Nucleosomal assembly in vitro, however, is sequence-dependent.

Obtaining well-diffracting crystals of nucleosomes was a difficult task, and was strongly dependent on the DNA sequence. All structures corresponded to nucleosomes assembled from purified histones and human α-satellite DNA sequences, until very recently, when two new crystal structures of nucleosomes containing the strongest known histone octamer-binding sequence have been reported [32,33]. This sequence is the Widom 601 DNA, the de facto standard for in vitro nucleosome reconstitution in chromatin biology research because of its tight binding, but it is a synthetic repetitive sequence that may not be the best representative of real genomic sequences assembled into nucleosomes. The two structures are very similar to the former ones, but with increased DNA twisting and a 145-bp core particle instead of the canonical 147-bp one. The increased twist occurs at two superhelical regions, which are the same regions where some of the histone–DNA contacts differ from those in the α-satellite nucleosomes. Therefore, the structure of the nucleosome can adapt to small variations in DNA length. One of these two structures also contains the protein regulator of chromosome condensation 1 (RCC1, also known as RanGEF or Ran guanine exchange factor), with implications for nuclear transport and mitosis. This structure is the first to show how a nonhistone protein recognizes and binds to the nucleosome (Fig. 4). It was found that arginines of the switchback loop of RCC1 interact with an acidic patch on the histone H2A–H2B dimer, whereas the DNA-binding loop interacts with phosphates of the nucleosomal DNA. These results are consistent with RCC1 being a non-DNA-sequence specific chromatin factor. Interestingly, the acidic patch on the nucleosome is the same as that occupied by the histone H4 tail of a neighbor nucleosome in the crystal lattice of the nucleosome [34].

image

Figure 4.  Structure of the nucleosome with bound RCC1 proteins. Ribbon diagram of the structure of the RCC1·nucleosome core particle complex assembled from Drosophila melanogaster RCC1, Xenopus laevis histones, and the Widom 601 DNA (Protein Data Bank entry 3MVD). In this view, the DNA superhelix axis lies horizontally and parallel to the plane of the page. The two RCC1 molecules are shown in pale yellow and in magenta. The DNA is represented by an orange rod (backbone) and blue–green sticks (bases), and histone H3, H4, H2A and H2B are shown in different colors. The two RCC1 molecules undergo equivalent interactions on each side of the nucleosome core particle. Prepared with pymol (http://www.pymol.com).

Download figure to PowerPoint

In prokaryotes, the DNA is condensed with polyamines and proteins. In enterobacteria, the histone-like nucleoid structuring proteins perform this role and regulate gene expression in response to environmental changes. The crystals of the oligomerization domain of histone-like nucleoid structuring proteins reveal an assembly of symmetry-related dimers into a superhelix, establishing a mechanism for the self-association [35]. Although there is no structure for the DNA-binding domain, the superhelical assembly suggests the formation of a complex with plectonemic DNA.

Engineering and design of protein-nucleic acid interactions – lessons from endonucleases

  1. Top of page
  2. Abstract
  3. Introduction
  4. Replication and transcription regulation – proteins in search of their sites on the nucleic acids
  5. Interactions lost in translation
  6. The transient positioning of the nucleosome along the genome
  7. Engineering and design of protein-nucleic acid interactions – lessons from endonucleases
  8. Challenges and the way ahead
  9. Acknowledgements
  10. References

Restriction endonucleases are the DNA protein binders with the widest application in biotechnology and biomedicine [36,37], and large efforts are being invested in the identification of new nucleases or the modification of extant ones to give novel or improved DNA sequence specificities [38,39]. Recent successful examples of redesigned protein–DNA interfaces illustrate our increased ability to achieve these goals by the manipulation of the direct readout interactions [40]. By use of the crystal structures of the complexes, these methods aim at optimizing the amino acids for affinity at the DNA interface, but they are not efficient for complexes in which indirect readout is dominant in DNA sequence recognition [41]. An increase in the number of the crystal structures of protein·DNA complexes should help to overcome this limitation [42,43].

Challenges and the way ahead

  1. Top of page
  2. Abstract
  3. Introduction
  4. Replication and transcription regulation – proteins in search of their sites on the nucleic acids
  5. Interactions lost in translation
  6. The transient positioning of the nucleosome along the genome
  7. Engineering and design of protein-nucleic acid interactions – lessons from endonucleases
  8. Challenges and the way ahead
  9. Acknowledgements
  10. References

Most structural studies are carried out not with full-length proteins, but with fragments. The most frequent reasons for this are the difficulty in producing large amounts of homogeneous material of large proteins, and the simplification of the system to facilitate crystallization and/or analysis by other techniques. However, investing time and effort in preparing and analyzing the full-length protein can be extremely rewarding, as shown by the information obtained with the tumor suppressor protein BRCA2 and its interaction with DNA [44,45]. As compared with protein·protein complexes, there is still little structural information on protein·nucleic acid complexes, especially for chromatin enzymes and factors. The transient nature of many of the interactions is probably one of the major difficulties in their identification, isolation, and structural characterization. MS is emerging as a potent tool for the study of dynamic or heterogeneous protein·nucleic acid complexes [46]. Although not a high-resolution structural technique, it has the bonus of requiring very little material. Small amounts are also used in single-molecule techniques, which are becoming a fruitful approach to answer specific questions, besides providing spectacular demonstrations of our prowess in manipulating and observing protein.nucleic acid complexes. Recent studies have addressed RNA folding by a helicase [47], DNA transport [48], and DNA polymerization [49].

Crystallography will continue to be the main technique for high-resolution studies. Tomographic electron microscopy can provide structures of protein·nucleic acid complexes inside the cell [50], and NMR has the potential to do so [51]. NMR is uniquely suited to characterize folding–unfolding events that occur at disordered regions of proteins that become structured upon recognition of their target nucleic acids, and can be usefully complemented by small-angle X-ray scattering [52,53]. Intrinsically disordered proteins or protein regions will increasingly be the focus of structural studies as regulators of molecular recognition processes [54].

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Replication and transcription regulation – proteins in search of their sites on the nucleic acids
  5. Interactions lost in translation
  6. The transient positioning of the nucleosome along the genome
  7. Engineering and design of protein-nucleic acid interactions – lessons from endonucleases
  8. Challenges and the way ahead
  9. Acknowledgements
  10. References