SEARCH

SEARCH BY CITATION

Keywords:

  • affinity chromatography;
  • amino acid–base interactions;
  • protein–DNA specific recognition;
  • structural data

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. CELLULAR PROTEIN–DNA COMPLEXES
  5. PROTEIN–DNA INTERACTIONS
  6. AMINO ACID–NUCLEIC ACIDS INTERACTION IN AFFINITY CHROMATOGRAPHY
  7. CONCLUSIONS
  8. REFERENCES

In this review, the protein–DNA interactions are discussed considering different perspectives, and the biological occurrence of this interaction is explained at atomic level. The evaluation of the amino acid–nucleotide recognition has been investigated analysing datasets for predicting the association preferences and the geometry that favours the interaction.

Based on this knowledge, an affinity chromatographic method was developed also exploiting this biological favoured contact. In fact, the implementation of this technique brings the possibility to apply the concept of molecular interactions to the development of new purification methodologies. In addition, the integration of the information recovered by all the different perspectives can bring new insights about some biological mechanisms, though not totally clarified. Copyright © 2010 John Wiley & Sons, Ltd.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. CELLULAR PROTEIN–DNA COMPLEXES
  5. PROTEIN–DNA INTERACTIONS
  6. AMINO ACID–NUCLEIC ACIDS INTERACTION IN AFFINITY CHROMATOGRAPHY
  7. CONCLUSIONS
  8. REFERENCES

In the last decade, a significant expansion on biological sciences has occurred because of the increased need for understanding cellular mechanisms. Several biological phenomena such as chromosomes organization, transcription, protein synthesis or apoptosis are only possible with the involvement of different biomolecules. In fact, numerous findings related with processes occurring inside the cells were only possible after the consideration and understanding of some molecular evidences.

In this review, the protein–DNA complexes and the interactions involved at different levels are discussed. In this way, the resultant interactions occurring between these biomolecules will be approached in the biological environment. The understanding of these interactions will be searched at the molecular point of view trying to define the mechanisms involved. Finally, it will be presented the development of a new affinity chromatographic methodology based on the exploitation of this natural association.

CELLULAR PROTEIN–DNA COMPLEXES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. CELLULAR PROTEIN–DNA COMPLEXES
  5. PROTEIN–DNA INTERACTIONS
  6. AMINO ACID–NUCLEIC ACIDS INTERACTION IN AFFINITY CHROMATOGRAPHY
  7. CONCLUSIONS
  8. REFERENCES

The interaction between proteins and nucleic acids is common in majority of cellular mechanisms (Blanco et al., 2005; Privalov et al., 2007) and involves proteins presenting domains able to recognize specific DNA sequences (Table 1) (Pabo and Sauer, 1984).

Table 1. Examples of protein–DNA interactions found in different cellular mechanisms
Protein–DNA complexCellular mechanism associatedInteractionsReference
DNA–histone coreDNA organization in chromosomesHistones interact with the DNA minor grooveRichmond and Davey (2003)
DNA–H1 histoneGene expression and regulationHappel and Doenecke (2009)
DNA–Hox proteinTranscriptionHox interact with TAAT motif by electrostatic interactionSvingen and Tonissen (2006)
DNA–TBPTranscriptionTBP binds to TATA box in minor grooveKim et al. (1993)

The evaluation of the recognition process implies the consideration of structural, kinetic and thermodynamic studies (Morozov et al., 2005). For instance, some structural studies provided valuable information about the protein–DNA interactions, giving the description of the complementary surfaces involved, responsible for the topology recognition, and the chemistry of these surfaces that is the base of the chemical recognition (Luscombe et al., 2001; Hoffman et al., 2004). The structure of both proteins and nucleic acids can be diverse and the proteins may interact with the DNA molecule by exhibiting amino acids' residues able to interact specifically with particular DNA sequences. This recognition is achieved by unique three-dimensional networks of several interactions such as hydrogen bonds, van der Waals forces, hydrophobic or electrostatic interactions (Pabo and Sauer, 1984).

The protein domains presenting a general affinity for nucleic acids mainly interact with the major groove of B-DNA because it exposes more functional groups that identify a base pair. In biological complexes it was already verified that the substitution of one single amino acid in the protein domain involved in recognition could induce a change on the DNA sequence that is recognized (Khorasanizadeh, 2004; Rohs et al., 2009). Although the bases' exposure is an important factor to determine the interaction with proteins, these complexes are stabilized for additional contacts between the lateral chains of amino acids and the deoxyribose rings and phosphate groups.

The three-dimensional structures of numerous protein–DNA complexes have been determined by X-ray crystallography and NMR spectroscopy, providing a basic representation of the way that these two macromolecules interact (Segal and Widom, 2009).

DNA double helix

The general structure of B-DNA is known since the 1950s, but only in the 1980s it was available in atomic detail. This structure is characterized by the double helix formed by two complementary strands, where the bases are paired and interact by hydrogen bonds and the backbone is formed by the sugar and the phosphate group (Bates and Maxwell, 2005; Clark, 2005). In this structure where the two strands spiral around each other to form a pair of right-handed helices, the bases occupy planes that are approximately perpendicular to the long axis of the molecule and are, therefore, stacked one on top of another. Hydrophobic interactions and van der Waals forces between the stacked, planar bases provide stability for the entire DNA molecule (Bates and Maxwell, 2005; Clark, 2005). An important issue related to the DNA structure is the fact that the spaces between adjacent turns of the helix forms two grooves of different dimension, a wider major groove and a more narrow minor groove, that spiral around the outer surface of the double helix. Proteins that bind to DNA often contain domains that fit into these grooves. In many cases, a protein bound in a groove is able to read the sequence of nucleotides along the DNA without having to separate the two complementary strands (Bates and Maxwell, 2005; Clark, 2005).

The DNA molecule is dynamic and its double helix is extremely flexible and may be affected by the base sequence, environmental conditions and interactions with other molecules. The possibility of numerous modes of protein–DNA recognition is related with some particularities of the nucleic acid molecule, such as its flexible structure, the presence of sequence-specific variations, combined with base-specific intermolecular interactions (Rohs et al., 2009). Unfortunately, the dependence of structure on sequence is not so well determined, since certain nucleotide sequences do not produce well-defined NMR maps.

The dimension and complexity of the DNA molecule have been considered to explain the protein interactions with DNA molecules. A direct mechanism usually is proposed considering the formation of specific contacts between amino acids' side chains and DNA bases, resulting in the specific recognition of a DNA sequence by the residues on the surface of a protein. Otherwise, an indirect model describes optimized protein–DNA interface geometries, taking account the global DNA structure influenced by the sequence-dependent bending (Rohs et al., 2009). In fact, these models have to be considered as simplified hypothesis to explain the complex mechanism underlined in the recognition and binding process involving these biomolecules. The prediction of the topology of the DNA molecule and the knowledge of the protein sequence and tertiary conformation is extremely important to define the possible contact atoms between the biomolecules. The knowledge of the molecules chemistry and particularly of the group of atoms involved is also relevant to predict the type of favoured intermolecular interactions.

Protein domains

The characterization of the protein domains that interact with nucleic acids allowed the discovery of some elements normally involved in this recognition mechanism. The existence of several families of DNA-binding proteins indicates that evolution has found a number of different solutions for the construction of polypeptides that can bind to the DNA double helix. Most of the motifs found in protein–DNA interaction contain a segment, often an α-helix, that is inserted into the major groove of the DNA, where it recognizes the sequence of base pairs that line the groove. Among the most common motifs that occur in eukaryotic DNA-binding proteins are the zinc finger, the helix–loop–helix and the leucine zipper. Each provides a structurally stable skeleton on which the specific DNA-recognition surfaces of the protein can be accurately positioned to interact with the double helix (Pabo and Sauer, 1984).

The majority of the transcription factors are proteins containing a motif named as zinc finger. In this case, the zinc ion is coordinated to two Cys and two His, and the global structure involves a β-sheet (where the two Cys are participant) on one side of the finger, and on the opposite side of the finger a short α-helix is present in which the His residues are involved. These proteins usually have several zinc finger motifs that act independently and can project into successive major grooves in the target DNA molecule (Karp, 2007).

The helix–loop–helix (HLH) motif is characterized by two α-helical segments separated by an intermediate loop. The HLH domain is often preceded by a stretch of highly basic amino acids whose positively charged side chains contact the DNA and determine the sequence specificity (Karp, 2007).

The leucine zipper domain occurs in a protein region organized in α-helix that repeats the leucine residue each seventh amino acid. The α-helix structure results of the orientation of all leucine residues of this sequence, in the same direction. Two α-helices of this character are capable of zipping together to form a coiled coil. The leucine zipper motif can bind DNA because it contains a segment of basic amino acids on one side of the leucine-containing α-helix (Karp, 2007).

In general the amino acids residues that are mainly involved in protein–DNA recognition are basic and positively charged. As discussed above, the complexity and specificity of the interaction between proteins and DNA is not totally explained by the protein domains, because the DNA structure also influences these contacts, as it is dependent of several environment factors.

DNA–histones interaction

An average human cell contains about 6 billion base pairs of DNA divided among chromosomes, what would constitute a DNA molecule 2 m long. The maintenance of the DNA molecule in the cell nucleus is only possible because of the extremely controlled organization. Within chromosomes, DNA is ordered in complexes with structural proteins into a compact structure called chromatin. The orderly packaging of eukaryotic DNA depends on histones that form a disk-shaped complex called a nucleosome (Simpson and Staffordt, 1983; Richmond and Davey, 2003).

The histones are a group of small proteins that possess an unusually high content of the basic amino acids Arg and Lys, and are divided into five classes, which can be distinguished by their Arg/Lys ratio. Some experimental evidences showed that very few amino acids in a histone can be replaced by other amino acids without severely affecting the function of the protein. The organization of each nucleosome consists on a nucleosome core particle containing a length of 147 base pairs of supercoiled DNA enfolded twice around the complex of eight histone molecules (Simpson and Staffordt, 1983). The histone core of each nucleosome consists of two copies of histones H2A, H2B, H3 and H4 assembled into an octamer, and the histone H1 is placed outside the nucleosome core particle binding to part of the linker DNA that connects two sequential nucleosomes (Richmond and Davey, 2003; Happel and Doenecke, 2009).

DNA and the core histones are held together by several types of noncovalent bonds, including ionic bonds between negatively charged phosphates of the DNA backbone and positively charged residues of histones. In this case, the two molecules make contact at sites where the minor groove of the DNA faces the histone core. Chemical modifications of the basic amino acid residues, which may include methylation, phosphorylation and acetylation, alter the interaction between the DNA and the histones, and in fact it plays an important role on genetic expression and regulation (Karp, 2007; Happel and Doenecke, 2009).

The interest on studying the chromatin organization is because the hierarchical chromatin structure is relevant to the vital processes of DNA replication, recombination, transcription, repair and chromosome segregation, and is also related to the pathological progression of cancer and viral disease (Richmond and Davey, 2003). Conformational differences between nucleosome core and oligonucleotide DNA are probably important for the recognition, or lack of it, of nucleosome DNA by nuclear factors. Furthermore, as demonstrated by experiments in vitro the dependence of nucleosome position and stability on base sequence derives from the energetic of sequence-dependent histone–DNA interactions (Flaus and Richmond, 1998).

Although capable of forming on any DNA sequence, nucleosomes have been shown to assemble preferentially on several different sequences (Simpson and Staffordt, 1983) and when associated with promoter elements, they may take on a positive or negative role in transcription activation (McPherson et al., 1993). For example, a DNA site displayed on the surface of a positioned nucleosome may take on a conformation which is specially recognized by a protein factor that facilitates activation (Li and Wrange, 1995). Otherwise, nucleosomes may partially mask DNA and therefore repress transcription.

In the last years, most of the work on chromatin structure and function was concentrated on the nucleosomal core. This focus on the core particle is mainly due to the detailed knowledge of the nucleosomal core structure derived from high-resolution crystallography (Richmond and Davey, 2003). However, some recent studies of Happel and Doenecke (2009) showed that histone H1 is also involved in gene expression regulation, and the research in this field is still in its beginning. The detailed analysis of the dynamics and the sites of interaction of histone H1 with DNA within the nucleosome may help to understand the functional contribution of H1 to the local structure of chromatin (Happel and Doenecke, 2009). The recent advances in this area are revealing that the relevance of the interaction between histones and DNA is beyond their role in DNA organization inside the cell, the association that occurs between these biomolecules is also extremely important as it directly influences the contacts with other functional proteins.

DNA–transcription factors interaction

Transcription factors are also proteins with the capacity to bind to particular DNA sequences, which are involved in transcription regulation. The proper functioning of cells involves accurate control of DNA transcription by the concerted and regulation action of transcription factors (Blanco et al., 2005). These proteins interact, with high affinity and specificity, with short DNA sequences that are usually located upstream of the gene (Blanco et al., 2005). The regulation activity of the transcription factors is associated to their capacity to bind the RNA polymerase responsible for transcription, either directly or through other intermediary proteins. Alternatively, transcription factors can bind enzymes that modify the histones at the promoter and may induce a change in the accessibility of the DNA template to the polymerase.

A particular study related with the Hox family of transcription factors revealed that a core DNA consensus sequence is recognized by the majority of homeodomain proteins, raising the question as to how transcriptional and biological specificity can be obtained (Svingen and Tonissen, 2006). Although, sequence-specific binding is a critical part of transcriptional regulation and thus biological specificity, DNA sequence recognition alone is not the only determinant factor. Binding of the homeodomain to a specific DNA sequence was described using findings from both genetic and structural studies. Genetically, a conserved TAAT motif was identified as a high-affinity binding site for numerous homeodomain proteins (Svingen and Tonissen, 2006).

In general, it has not been possible to formulate any global recognition code that directs the specificity of the interactions between protein side chains and the DNA bases. However, some recurrent patterns have been described such as the frequent involvement of amino acids with long side chains, like Arg or Gln residues, that can participate in bidentate contacts with the nucleic acid bases. Indeed, the most commonly occurring interaction is of Arg with guanine and of Gln with adenine. Similarly to what was described for the histones interaction, the involvement of electrostatic interactions has also to be considered in this case, because of the presence of positively charged amino acids in the transcription factors domains that interact with DNA. These contacts may appear non-specific, but they play a critical role in orienting the protein so that specific interactions can be established in the major groove (Kim et al., 1993; Svingen and Tonissen, 2006). An additional determining factor for the interaction occurring between proteins and nucleic acids is the DNA topology, since the double helix can adopt discrete conformations depending on the sequence and upon the degree of hydration. In accordance with this data, it is known that both the local structure and deformability play a major role in protein–DNA recognition. Indeed, there are many examples of protein–DNA complexes in which the DNA is significantly distorted. One particular case is the DNA distortion seen in the binding of the TATA-binding protein to its DNA target in which the DNA has two 90° bends (Kim et al., 1993). In this work, the authors described that the side-chain/base interactions are restricted to the minor groove, and include hydrogen bonds, van der Waals contacts and Phe–base-stacking interactions (Kim et al., 1993). In general, the recognition and binding of DNA by the transcription factors is not occurring with a single sequence, because a small domain is not able to promote a sufficient number of contacts to specify a unique target site. Some strategies developed at biological level include the involvement of arms or tails that recognize additional features of the DNA, particularly in the minor groove. The dimerization of some transcription factors also improves the recognition and specificity of a particular sequence (Karp, 2007). The remarkable mechanism underlying the capacity of two large biomolecules to specifically recognize small sequences is still under investigation, with details to be understood. Structural biology has made efforts to understand how transcription factors bind and recognize their specific DNA targets, and numerous protein–DNA complexes have been analysed by X-ray crystallography or NMR spectroscopy. In addition, significant progress was recently achieved in measuring the affinity of factors to a relatively small number of sequences, using high-throughput technologies such as ChIP–chip, a technique that combines chromatin immunoprecipitation (ChIP) with microarray technology (chip) (Segal and Widom, 2009). Despite the improvement achieved with relation to the information attained with earlier techniques such as Footprinting or Southwestern Blotting, the identification of much smaller sequence motifs involved in these interactions still requires post-processing computation (Segal and Widom, 2009). In fact, because of the challenging nature of this issue, the investigation is directed to the understanding of the interactions between nucleic acids and proteins that allow the translating of DNA sequences into transcriptional behaviours (Segal and Widom, 2009). The general accepted idea behind this translation process is that DNA-binding molecules have intrinsic affinities to DNA sequences that are specific to each molecule, and thus, every sequence defines a unique affinity landscape with respect to each molecule.

PROTEIN–DNA INTERACTIONS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. CELLULAR PROTEIN–DNA COMPLEXES
  5. PROTEIN–DNA INTERACTIONS
  6. AMINO ACID–NUCLEIC ACIDS INTERACTION IN AFFINITY CHROMATOGRAPHY
  7. CONCLUSIONS
  8. REFERENCES

As discussed in the previous topic, interactions between proteins and nucleic acids are crucial for the understanding of numerous biological mechanisms such as, replication, transcription, splicing and DNA repair (Babu et al., 2004; Trevor et al., 2005). Protein–nucleic acid recognition can be understood by atomic interactions between amino acids and nucleotides (Rhodes and Stephen, 2000; Lavery, 2005; Coulocheri et al., 2007; Hégarat et al., 2008). These complexes have been analysed by several authors, for more than 30 years. Seeman et al. (1976) predicted, from crystallographic analysis, potential positions for hydrogen bonds between the atoms of the amino acid side-chains and the groups at the base pairs in the double helix or amino acids and the backbone. These interactions are indeed the most commonly observed in protein–DNA complexes. However, they have considered the role of hydrogen bonding rather than other types of interactions in the recognition process. Pabo and Sauer (1984), and later Matthews (1988), hypothesized that the recognition of some specific base sequences could be derived from hydrogen bonds and van der Waals interactions. Most of these H-bonds involved the amino acid side-chains and the atoms at the base edges of the nucleotides and the backbone. The first one contributes to the protein specificity, while the second one appears to be important for stabilization and orientation of the complex. Nevertheless, these studies were restricted due to the small number of high-resolution structures available and were limited to descriptions of interactions of the complex (Pabo and Sauer, 1992; Steiner and Saenger, 1993; Spolar and Record, 1994). However, and based on the structures available, Suzuki (1994) demonstrated that the occurrence of different interactions may be explained by stereochemical requirements (Suzuki, 1994). From the analyses of the crystal structures, some residues bind to just one or two types of bases but others do not discriminate; for instance, the hydrophobic residues such as Ala, Val, Ile, Leu, Met, Phe, Tyr, Trp and Thr binds the methyl group on T; Arg and Lys bind predominantly G; Asp and Glu bind to C or A bases; Asn and Gln bind to all bases but most often to A and some residues, such as Ser, bind all the four bases.

Further studies (Mandel-Gutfreund et al., 1995; Mandel-Gutfreund et al., 1998a) were carried out with an increasing number of complexes at high resolution, as more structures were becoming available in the Protein Data Bank (Berman et al., 2000) and a wide variety of methodological approaches were applied, generally based on geometric (i.e. analysis of spatial distances, atom–atom interaction angles) (Mandel-Gutfreund et al., 1998a) or statistical/probabilistic considerations (Luscombe et al., 2000; Reddy et al., 2001). Other types of interactions between amino acids and bases in the major or minor grooves, including the effect of water molecules (Luscombe et al., 2001; Morávek et al., 2002) cation–π (Wintjens et al., 2000; Rooman et al., 2002) and stacking interactions (Rooman et al., 2002; Biot et al., 2004) were investigated. Consequently, databases with these interactions were created, such as Protein–Side Chain Interactions database and Protein–Nucleic Acid Interaction server (Flocco and Mowbray, 1994) which allowed predicting common rules that assign the recognition process. This database was later improved since overlapping structures could not be analysed and interactions involving the peptide backbone and sugar–phosphate backbone were not included. Hoffman et al. (2004) created the amino acid–nucleotide interaction database (AANT) that deconstructs 930 protein–nucleic acid structures into sets of amino acid nucleotide interactions, allowing to identify the residues that participate in multiple interactions with the nucleotides moieties, modulating new nucleic acid structures and placing sterically similar interactions and altering specificities. Nevertheless, the AANT only predicted hydrogen bonds. Subsequent larger databases of protein–DNA and protein–RNA were constructed, analysing several types of interactions in both complexes (Lejeune et al., 2005; Mukherjee et al., 2005). More recently, Marabotti et al. (2008), using computational methods, determined accurate protein–DNA interaction motifs, such as, the frequency of occurring contacts, the contribution of bridging water molecules, the energetic contribution of each motif and detail of the relative preference of each amino acid–base contact in global protein–DNA binding.

This review focuses on the recognition rules of the amino acids nucleotides based on size, composition, specific interactions shape complementarity and geometry of the binding. This statistical analysis was carried out based on crystal structures of protein–DNA complexes and also some computational studies. Almost all types of interactions were assembled, such as, electrostatic interactions, H-bonds (XH[BOND]Y H-bonds and CH···O H-bonds), van der Waals, hydrophobic, water-mediated and cation–π interactions, and also were identified the side-chain of amino acids, and the phosphate, sugar or base edge of nucleotides.

Analysis of amino acid–nucleotide databases

The majority of the databases of amino acid–DNA nucleotides interactions were constructed from the analysis of X-ray structures of protein–DNA complexes available in Protein Data Bank (Berman et al., 2000).

Normally, the crystal structures of protein–DNA complexes were selected with a resolution of 3 Å or less and with the appropriate software discarded all repeated biological units and complexes composed with co-crystallized molecules. Afterwards, some parameters may be determined from the well-resolved structures complexes such as frequency, propensities, energy of amino acid–nucleotide interaction and energetic water-mediated interactions. Different programs were used to determine the atoms involved in hydrogen bonds and van der Waals contacts in the structures. In some studies, the analysis of the hydrogen bonds was also expanded from simple one-to-one amino acid–nucleotide interactions towards bidentate and complex interactions (Luscombe et al., 2001).

The continuous improvement in the experimental methods for structure determination has led to a significant increase in the number of solved DNA–protein complexes, enabling a systematic analysis of all the contacts involved.

Numerous studies of protein–DNA complexes for discovering the universal recognition rules have focused on the type of atoms involved between the amino acid nucleotide interactions (base edge or backbone) or in the main interactions types (H-bonds, electrostatic, van der Waals). For this purpose, selected studies published in the literature were discussed and compared.

Amino acid–base interactions

Seeman et al. (1976) defined potential positions for H-bonds in the DNA grooves, W1 and W2 in the major (wide) groove and S1 and S2 in the minor (small) groove (see Figure 1).

thumbnail image

Figure 1. Illustration of the DNA positions in the grooves determined by Seeman et al. (1976). W stands for potential recognition sites in the major groove while S stands for sites in the minor groove. (a) Base pairing between adenine (A) and thymine (T), showing the two H-bonds made between them. (b) Base pairing between guanine (G) and cytosine (C), showing the three H-bonds made.

Download figure to PowerPoint

The majority of interactions of the amino acids take place in the major groove atoms particularly at position W2, while in the minor groove occur at position S1. The preferences for W2, instead of W1, are due to the fact that all four bases contain potential hydrogen-bonding atoms in that position. Moreover, guanine shows the highest degree of participation to H-bonds due the purine base followed by adenine (A), cytosine (C) and thymine (T). More specifically, the relative preferences of amino acids for bases are basically dependent on the donor and acceptor patterns of bases (see Figure 1): in the major groove, one donor N6 and acceptor N7 of A, two acceptors O6 and N7 of G, one acceptor O4 of T and one donor N4 of C; in the minor groove, one acceptor N3 of A, one acceptor N3 and donor N2 of G and one acceptor O2 each of C and T.

Further work (Matthews, 1988; Pabo and Sauer, 1992) has placed greater emphasis on finding binding specificity by the inherent characteristics of the amino acids, like chemical structure and size. Amino acids with different geometries can form a similar type of interaction with bases. Based on these features, Suzuki studied 20 crystal structures of protein–DNA complexes and found amino acids that bridge two base pairs by H-bonds and electrostatic interactions and classified it into three groups: the double acceptor type (Asp and Glu), the double donor type (Arg and Lys) and the acceptor + donor type (Asn, Gln, Cys, Ser, Thr and Tyr). Asn, Gln, His, Ser, Cys, Thr and Tyr can act either as double acceptor or double donor (Suzuki, 1994). The author also discussed that double H-bonds between amino acids and base pairs are necessary for the discrimination of different bases by amino acids. Mandel-Gutfreund characterized the hydrogen bonds in 28 protein–DNA complexes and also confirmed the existence of significant interdependence between amino acid–base pairs in the interactions, to identified common principles used in specific recognition (Mandel-Gutfreund et al., 1995; Mandel-Gutfreund et al., 1998a). The persistence of the pairs Arg-G, Lys-G or Asn-A, Gln-A and other amino acid that prefers one base over another are attributed to the environment of that base or because it enables bifurcated bonds. The difficulties of these studies were due to the small number of high-resolution structures available and the confined descriptions of the interactions in the context of the complex they came from.

Amino acid–base and backbone interactions

In the beginning of 2000, additional high-resolution protein–DNA structures became available in the Protein Data Bank (PDB) (Berman et al., 2000). Thus, more studies focused on the atomic contacts of protein–DNA and protein–RNA complexes were carried out (Mandel-Gutfreund and Margalit, 1998b).

Luscombe et al. (2001) in 2001 analysed 129 protein–DNA complexes and investigated, for the first time, the roles of the hydrogen bonds, van der Waals contacts and water-mediated interactions. They compiled the total number of these three interactions among 20 amino acids and the four DNA bases and backbone (sugar and phosphate groups), and examined their contribution into dataset highlights. They found 1111 hydrogen bonds, 3576 van der Waals contacts and 821 water-mediated bounds. As described in previous studies, an essential difference exists between hydrogen bonds and van der Waals interactions and is related to their different directionality characteristics. H-bonds are inherently directional and more specific than van der Waals bonds, since they demand the existence of molecules with complementary hydrogen donor and acceptor groups. Water-mediated bounds are indirect hydrogen bonds between amino acid and nucleotides and also contribute for the recognition process by mediating the shape complementation between these components and the close packing.

The distributions of these three types of interactions, according to the participating amino acid and nucleotide components, are presented in Figure 2. As can be seen, the most prominent are hydrogen and water-mediated bonds between the amino acids and the oxygen atoms of the phosphate groups (O[DOUBLE BOND] and O[BOND] of the phosphodiester), approximately 70 and 50% for van der Waals contacts, respectively. By analysing the interactions between the amino acids and the DNA base edges, these ones constitute only 32, 22 and 29% of the hydrogen bonds, van der Waals and water mediated, respectively. The van der Waals contacts with bases differ from the hydrogen bonds distribution as thymidine interacts more, followed by adenine, guanine and cytosine, while for hydrogen bonds and water-mediated bonds, guanine interacts more, followed by adenine, thymidine and cytosine. It should also be noted a significant number of van der Waals contacts with the sugar group (about 26%).

thumbnail image

Figure 2. Distribution of hydrogen bonds, van der Waals and water-mediated bonds according to the participating amino acids and nucleotides base or backbone group report by Luscombe et al. (2001).

Download figure to PowerPoint

These three interactions are specific of almost all the amino acids. Hence, the polar (Asn, Gln, His, Ser and Thr) and the positively charged (Arg and Lys) amino acids were the most involved in the hydrogen-bonding, mainly the last ones. Comparatively and among amino acids the percentage of N[BOND]H···π, C[BOND]H···π, C[BOND]H···O and N[BOND]H···O hydrogen bonds is higher for His, Arg and Trp and lower between Tyr and Trp (Chakrabarti and Bhattacharyya, 2007).

The negatively charged amino acids (Asp and Glu) were less involved presumably due the unfavourable electrostatic interactions with the phosphate groups. On the other hand, the hydrophobic amino acids Ala, Ile, Leu, Met, Val (aliphatic), Phe, Trp, Tyr (aromatic) and (Cys, Gly, Pro) are the disfavoured nucleotide partners. The most involved amino acids in van der Waals contacts were Arg, Thr, Phe, Ile, Gln, Lys, Gly, Ser, His and the amino acids which interacts less were Glu, Asp, Ala and Leu. These last facts can be attributed to the unfavourable electrostatic interactions of the two acidic residues (Glu and Asp) and the shortest chains of the Ala and Leu. The explanation of the affinity of Phe and His with the bases is their ability to do stacking interactions. Once again between amino acids, van der Waals and electrostatic interactions predominate between almost all amino acid side chains except for the electrostatic terms involving charged Lys, Arg, Asp and Glu residues, which were bound in salt bridges or networks of charged groups. Besides, edge face and offset stacked aromatic interactions between Tyr and Trp contribute for stabilizing misfolded structures (Chakrabarti and Bhattacharyya, 2007).

Finally, and relatively to water-mediated bonds, the polar and charged amino acids are frequently involved in these interactions and also Glu and Asp make significant contributions, presumably because of their ability to interact at distance. As well, hydrophobic amino acids, such as, Ala and Gly, participate in these bonds using their main chain atoms.

This analysis also establishes clear preferences for particular pairings of amino acids–bases interactions. Thus, the favoured amino acid–base hydrogen bonds pairs were Arg-G and Lys-G, followed by Arg-T, Arg-A, Asn-A and Gln-A. The lesser extend was Ser-G followed by His-G and Glu-C and by Asn-C and Asp-C. The most significant van der Waals pairs found were Arg-G, Gln-A, Thr-T, Phe-A, His-A, while the most relevant water-mediated pair was Arg-G, followed by Asn-A, Asn-G, Lys-G and Glu-C.

Hydrogen bonds geometries

Luscombe et al. (2001) extended the analysis of the hydrogen bonds in the protein–DNA complexes from simple one-to-one amino acid–base pairs to bidentate (in which there are two or more hydrogen bonds with a base), bifurcated (interactions where a hydrogen atom is shared between two bonds) and complex bonds (where an amino acid interacts with more than one base simultaneously).

The distribution of hydrogen bonds with the nucleotide bases is 33% in single H-bond and 67% in bidentate and complex interactions. These last two bonds are partially dependent on the conformation of the nucleotides and these considerations confer specificity in the recognition process. The identification of the amino acid–base combinations that involve these interactions is described as follows. For single hydrogen bonds, the amino acids that participate more are the Arg, Ser and His. Bidentate interactions provide an explanation for the specificity of Arg and Lys for guanine and Asn and Gln for adenine. Arg and Lys make two hydrogen bonds with the oxygen and nitrogen atoms with guanine base and Asn and Gln interacts with adenine through two H-bonds with the nitrogen and oxygen atoms, one from A and other to A. Gln also interacts with guanine in the minor groove. Other amino acids, such as, Ser, Thr, His and Cys do not participate in bidentate interactions with the bases. Thr has the hindrance of the methyl group and for steric reasons of Tyr. His has both H-bond donor NH and H-bond acceptor N, and making a bidentate contact is very difficult because of the opposite directions of the hydrogen-bonded atoms. However, this amino acid and Ser make bifurcated interactions with guanine. Relatively to the complex hydrogen-bonding interactions, multiple combinations between acceptors and donors amino acids and bases were found. The amino acids Arg and Lys (double donor type) are the mostly involved followed by Asp and Gln (double acceptor type) and Asn, Gln, Cys, Ser, Thr and Tyr (acceptor + donor type). There are two classes of complex H-bonds regarding the positioning of the interacting bases. In the first class, bases are linked to the same strand and are stacked directly above one another; these bases are represented by a full point between them (e.g. G.G). In the second class, bases are linked to different strands and are situated diagonally to each other; these are represented by a backslash (e.g. G/G).

Stair motifs in protein/DNA complexes

Regarding the contacts between protein and DNA studies including other types of interactions, such as cation–π interactions, have been explored. Wintjens et al. (2000) showed the existence of cation–π interactions by analysing 48 high-resolution X-ray structures of protein–DNA complexes and by quantum mechanics energy calculations. They found frequent proximity of aromatic rings of nucleotides bases and positively charged groups of amino acid side chains. In Figure 3 the frequency of cation–π interactions in protein–DNA base complexes is presented. As can be seen, 55% of the cation–π interactions involve Arg, whereas 14% involve Lys, 18% Asn and 13% Gln. Lys is roughly as frequently observed as Asn and Gln in this type of interaction, despite the fact that the latter two only carry a partial positive charge, and Arg residues are three to four times more frequent than the other charged residues.

thumbnail image

Figure 3. Frequency of cation–π interactions at protein–DNA interfaces. The percentages indicate the occurrence of a amino acid–DNA base pair is computed as the product of the frequency of the amino acid observed in the set of 141 proteins used by Wintjens et al. (2000) and the frequency of the DNA bases: thymidine (rose), guanine (cyan), cytidine (red) and adenine (green).

Download figure to PowerPoint

The data presented also indicate that the propensity that take part in cation–π interactions increase from C, T, A to G. Furthermore, the most observed pairs were Arg-G, followed by Arg-A and the lesser ones were Arg-C and Arg-T. For the bases T and C, which only possess a six-atom ring, the charged amino acid side-chain is always located above the ethylenic bond C5[BOND]C6, (see Figure 1) which is accessible to the solvent in free DNA and situated at the major groove side. In the case of G and A, the charged amino acid side-chain clearly prefers to be located above the imidazole where it interacts specifically with the N7 or C8 atoms (see Figure 1). It must, however, be noted that most of the Arg residues that form cation–π interactions simultaneously form H-bonds with a contiguous DNA base. H bond and cation–π interactions are thus strongly associated in the protein DNA context (Rooman et al., 2002). The cation–π/H-bond interaction of Arg-G is illustrated in Figure 4.

thumbnail image

Figure 4. Illustration of cation–π/H-bond interactions of Arg-G motifs based on reference (Rooman et al., 2002).

Download figure to PowerPoint

The concept of these stair motifs suggests that they might play a structural and/or functional role creating electron migration through the DNA causing long-range information transfer. This exciting issue will certainly enable future research work.

Comparison of interactions databases of protein–DNA interfaces

Hoffman et al. (2004) continued the analysis of protein–nucleic acid structures from the Protein Data Bank and construct an amino acid–DNA and RNA nucleotides database (AANT). The AANT only predicts hydrogen bonds (H-bonds or CH···O H-bonds) between the nucleotides moieties (sugar, phosphate and base) and amino acid (side chain and backbone).

In this discussion, we will focus only in the interactions of the entities amino acid–DNA nucleotides. The number of hydrogen bonds presented in AANT was already noticed by Luscombe et al. (2001) and it was very similar. The proportions were sugar 2%, phosphate 72%, bases 26% for AANT, and sugar 2%, phosphate 66%, bases 32% for the second study. These small discrepancies between these two studies are possible due to the ways in which hydrogen bonded contacts were counted or found. However, in both studies, the phosphate groups were more involved in H-bonds than the sugar and the bases moieties.

The results provided by the AANT were based in 930 solved structures of complexes, in which H-bonds between the amino acids and nucleotide moieties were all extracted. The χ2 tests and interatomic distances are considered but there is no reference to an energetically ranking. In order to rationalize these data, the distribution of hydrogen bonds of the identities amino acid–DNA nucleotide was presented in Figure 5. Other types of interactions, such as, stacking interactions van der Waals contacts and water-mediated bonds are not included in this study.

thumbnail image

Figure 5. Distribution of hydrogen bonds between amino acids and nucleotides moieties in AANT report by Hoffman et al. (2004); the percentages indicate the fraction of total H-bonds between a particular amino acid and the four nucleotides: thymidine (rose), guanosine (cyan), cytidine (red) and adenosine (green).

Download figure to PowerPoint

The amino acids Lys and mainly Arg are the most involved in H-bonds with all nucleotides, followed by Thr, Ser and Asn, while Cys and hydrophobic amino acids are the less involved.

Some amino acids have preference for some nucleotides over the others: Asp and Glu prefers cytidine over the others, Gln, Asn and Ser prefers adenosine and Thr prefers thymidine. The preferred binding of Asp and Glu to cytidine may be due to its relatively positive charge, attributed to its one donor only (in comparison to adenine with both donor and acceptor atoms in the major groove). While, the amino acids Asn and Gln contain donor and acceptor atoms that complement the acceptor and donor of adenine, and could either form single bonds or bridging contacts. Ser and Thr with their hydroxyl group have clear preference to interact as donor. Meanwhile, the most preferred amino acid for guanosine is Arg, followed by Lys, due to its acceptors atoms in the major groove suitability for making hydrogen bonds with the two donor atoms of Arg and one of the Lys. Furthermore, and analysing the occurrence of nucleotide moieties interacting with amino acids presented in Figure 5, it was noticeable that thymidine is the most interacted nucleotide, followed by adenosine, cytidine and surprisingly guanosine is the less interacted nucleotide. This can be ascribed to the fact that thymidine and cytosine appear to be more favourable in H-bonds that involve sugar and phosphate atoms and less favourable in H-bonds that involve base edges. While with guanosine it is the opposite due to the two acceptors atoms in the purine base.

These structural analyses provide an example of the type of insights that may be possible by using AANT as a tool for studying protein–nucleic acid interactions. The objective of the authors was to predict and classify other interactions than hydrogen bonds and thus, to introduce some improvements in AANT.

Subsequent database of non-homologous protein–DNA and protein–RNA were constructed from structures available in the Protein Data Bank. An example was the largest database constructed by Lejeune et al. They analysed the electrostatic, H-bonds, hydrophobic and van der Waals interactions of 139 protein–DNA and 49 protein–RNA, distinguishing the atoms belonging to the amino acid side chain and to nucleotide phosphate, sugar and base (Lejeune et al., 2005).

Similar to the analysis of previous studies, we only focus in the interactions involving the entities amino acid–DNA nucleotides. In order to rationalize these data, we present in Figure 6 the frequent pairs of amino acid–DNA nucleotides: (a) Arg-G, Arg-C, Arg-T, Arg-A, (b) Lys-G, Lys-T, (c) His-G, (d) Asn-C, Asn-T, (e) Tyr-G, (f) Trp-C and also the main interactions observed in this study.

thumbnail image

Figure 6. Distribution of van der Waals, electrostatic, hydrophobic and H-bonds according to the nucleotide part for most significant pairs in amino acids–DNA nucleotides report by Lejeune et al. (2005): (a) Arg-G, Arg-C, Arg-A, Arg-T, (b) Lys-G, Lys-T, (c) His-G, (d) Asn-C, Asn-T, (e) Tyr-G, (f) Trp-C.

Download figure to PowerPoint

Based on these data it can be seen that Arg is the amino acid which interacts more, followed by Lys. The pair Arg-G is the most frequent and about 40% is involved in H-bonds between the atoms of guanidinium group of the Arg and acceptor atoms of the guanine followed by electrostatic interactions make up 28% (see Figure 6a). In the Arg-C pair the electrostatic interactions are the most significant (≈30%) with the phosphate atoms, followed by van der Waals contacts between Arg and deoxyribose and base atoms. The H-bonds between Arg and cytidine base are less abundant (<10%) than in Arg-G, since the pyrimidine ring has only a single acceptor atom on the minor groove and a single donor atom on the major groove. The interactions with the sugar atoms are hydrophobic contacts involving the aliphatic atoms of the Arg side chain (see Figure 6a). The pairs Arg-A and Arg-T have similar behaviour to the Arg-C pair. About 40% involve electrostatic interactions between Arg and the phosphate atoms, and the low occurrence of H-bonds is due to the decreasing order of hydrogen-bonding potential (see also Figure 6a). In the same way as in Arg-G the electrostatic interactions involving the phosphate atoms and the H-bonds between the amine hydrogen atoms and the two acceptor atoms of guanine base are the most significant. For the hydrophobic contacts, 5 and 10% involve sugar C[BOND]H with guanine and thymidine, respectively, and approximately 30% with the methyl group of thymidine. The van der Waals contacts between Lys and the sugar or base atoms of guanine and thymidine are due to the positively charged of both (see Figure 6b). Relatively to the graph (c) presented in Figure 6, the H-bonds between His and the oxygen atoms of phosphate (approximately 40%) or the acceptor atoms of guanine (>40%) are the most significant. The predominant interactions of Asn-C and Asn-T are mainly H bonds with the pyrimidine base and with the oxygen atom of the phosphate group. Also, van der Waals contacts were predominant involving the C[BOND]H of the deoxyribose (see Figure 6d). The graph (e) respecting to the pair Tyr-G shows 40% of H-bonds between Tyr and the oxygen atom of the phosphate group and approximately 5% with the purine base. This is due to the presence of the hydroxyl group on the phenyl ring, enabling Tyr to make H-bonds with guanine. Hydrophobic contacts between aromatic C[BOND]H of Tyr and the C[BOND]H of sugar are only 7% and van der Waals contacts with C[BOND]H of sugar and the base were approximately 10 and 15%, respectively.

Finally, and regarding the graph (f) presented in Figure 6, the interactions between Trp are mainly with cytidine, and approximately 50% are H-bonds with phosphate oxygen, 30% are hydrophobic contacts between side-chain and sugar C[BOND]H and 20% van der Waals contacts with the pyrimidine base.

In summary, these results are similar to those obtained by others based on protein–DNA complexes. Hydrogen bonds with phosphate are the most frequent in contrast with the bases. Electrostatic interactions represent only 8% of the contacts between amino acid and the phosphate atoms of the nucleotides and have been shown important for the stabilization of the complexes. This study also counts the hydrophobic contacts which involve mainly sugar atoms and represents 19% and also the van der Waals interactions which involve sugar and base atoms and represents about 20% of the contacts in amino acid–nucleotide complexes. Water-mediated contacts have not been examined in this study due to the position of the water molecules in the complexes structures.

Also, thermodynamic and kinetic analyses of biomolecular interactions reveal details of the energetic and dynamic features of molecular recognition processes, and complement structural analyses of the free and complexed conformations. One of the first studies involving determination of thermodynamic parameters of nucleotides and some amino acids was published in 1974 (Wagner and Arfmann, 1974). The equilibrium constants determined in this study by equilibrium dialysis and circular dichroism shows that Arg have the stronger affinities in the order: G (Ka = 14.5) > A (Ka = 9.4) > C(Ka = 5.3) > U(Ka = 4.6) while lysine is only distinguishable by G with Ka = 3.25 (Wagner and Arfmann, 1974).

Advances of the isothermal titration calorimetry (ITC) and surface plasmon resonance (SPR) techniques provide powerful tools for analysing protein–DNA interactions in thermodynamic and kinetic approaches (Oda and Nakamura, 2000). The thermodynamic parameters are clearly different between specific and non-specific DNA binding. For example, the characteristic features of non-specific DNA binding are (i) positive ΔS, (ii) smaller negative ΔH, (iii) higher sensitivity of ΔH against ionic strength, and (iv) smaller ΔCp. Kinetic analyses have shown that the difference between specific and non-specific DNA binding is mainly due to the difference in dissociation rates. For instance, in specific binding, the specific contacts between the biomolecules can make the complex more stable and the dissociation rates slower. This is a characteristic feature of sequence specificity.

More recently, Marabotti et al. (2008) determined the frequency and energy of interaction between each amino acid and base of 100 high-resolution protein–DNA complexes.

Moreover, they estimate the energetic contribution of the water molecules in the binding of amino acid–base pairs. The geometry-based tool was carried out for evaluating hydrogen bond interactions and determining the optimal geometrical parameters (distance and angle), which contribute for the complex formation and stabilization. No estimation of the phosphate and sugar contribution was provided. They take into account only the specific interactions between the amino acids and the nucleotide bases at the major and minor groove.

The most significant energetic amino acid–base pairs are Arg-G, Asn-A, Asp-C, Gln-A, Glu-C, and Lys-G, and also His-G and Ser-G. Preferences are marked also for Ala-C, Cys-G, Gly-G, Leu-A, Thr-G and Trp-C, while no apparent base preference is related to Ile, Phe and Tyr. In general, hydrogen-bond atoms donors from Arg, Cys, His, Lys, Ser and Thr prefer guanine, while the hydrogen-bond acceptor atoms of Asp and Glu prefer cytidine. Asn and Gln, having both donor and acceptor moieties, prefer adenine. Thymidine has the highest frequency of interaction with positive amino acids like, Arg, Lys and Phe and also with negative ones like, Asn, Gln, Asp and Glu.

As previously mentioned, Marabotti et al. (2008) reported an advanced computational tool, that allowed to determine the energetic contribution of bridged water molecules in the interaction of each amino acid–base. Also, Luscombe et al. (2001) report the contribution of water-mediated bonds by analysing the crystallographic waters located inside of the protein–DNA interface complexes. They concluded that a water molecule can make more than two hydrogen bonds simultaneously.

The energetic contribution played by water molecules in amino acid–base recognition is summarized with the CWater coefficients in Figure 7. This enhancement factor indicates the extent by which a defined amino acid increases its interaction energy with a specific base via a bridging water molecule.

thumbnail image

Figure 7. Distribution of the water-mediated interactions between amino acids–nucleotides base report by Marabotti et al. (2008). Bases are identified by: thymidine (rose), guanine (cyan), cytidine (red) and adenine (green).

Download figure to PowerPoint

Regarding the effect of the water-mediated on the amino acid–base interaction (see Figure 7) the amino acid–base pairs where water mediation exhibits a significant effect are Thr-A, Lys-A, Lys-C, Asp-A, Asn-G, Tyr-A, Thr-C, Thr-A, Asp-G, Gln-G, Ser-A, Tyr-C, Arg-C, Arg-A. The interactions with the amino acids Asn and Asp are more significant than with Gln and Glu with the bases, probably due to the shorter side chain of the amino acid which requires water mediation to effectively interact with the bases. Furthermore, interactions with the three hydroxyl amino acids, Ser, Thr and Tyr are also markedly enhanced by water. In contrast, no significant bridging water molecules on the Arg-G were found possibly due to the fact that hydrogen bond-donating atoms of Arg easily interact with carbonyl acceptors in bases without water mediation and also due to the flexibility of its long side chain is an additional advantage.

It is interesting to note that the interactions of the amino acids with adenine are markedly enhanced relatively to the others, especially with H-bond donor atoms. This is probably due to the decrease of the electrostatic repulsions between the donor atoms of the adenine and the side chain of the amino acids promoted by the water molecules. It should also be noted that the interactions with thymidine were decreased, presumably because the hydrophobic part of this base cannot make favourable interactions with water molecules.

Marabotti et al. (2008) also evaluated specific amino acid–base binding by hydrogen bonding in terms of energetic preferences. The high bond energy of the pairs Arg-G, Lys-G and Asn-A determined by these authors, was explained by the formation of bidentate interactions which increase the specificity of the recognition process. In contrast, single and complex hydrogen bonds, van der Waals interactions and water-mediated contacts are strongly dependent on the conformation of the interacting biomolecules and usually produce context-dependent specificities.

AMINO ACID–NUCLEIC ACIDS INTERACTION IN AFFINITY CHROMATOGRAPHY

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. CELLULAR PROTEIN–DNA COMPLEXES
  5. PROTEIN–DNA INTERACTIONS
  6. AMINO ACID–NUCLEIC ACIDS INTERACTION IN AFFINITY CHROMATOGRAPHY
  7. CONCLUSIONS
  8. REFERENCES

The molecular interactions occurring between proteins and amino acids were explained at atomic level, evaluating the contacts between the two respective basic units, the amino acids and the nucleotides. On the basis of this previous description, it was evaluated the possibility to exploit the favoured amino acid–nucleic acid interactions to develop a chromatographic methodology to purify plasmid DNA.

Advances in medical and pharmaceutical areas are always significant to give more and more accurate and efficient answers to the untreatable pathologies. Biotechnological investigation has to attend these requirements and exploits the possibilities to establish new processes and methodologies to overcome these challenges. The research on gene therapy and DNA vaccination (Anderson and Schneider, 2005; Bråve et al., 2007) is being intensified and the development of resourceful, practical, efficient and consistent production and purification strategies is thus required (Stadler et al., 2004).

The clinical application of plasmid DNA (pDNA) requires its recovery as a highly pure product. Thus, the majority of pDNA purification protocols are based on liquid chromatography to accomplish the strict quality criteria established by regulatory agencies (Diogo et al., 2005).

Recently it was implemented a new affinity chromatography approach, named as amino acids–DNA affinity chromatography (Sousa et al., 2008a) to purify supercoiled (sc) pDNA (Table 2).

Table 2. Characterization of amino acid–nucleic acids interaction in affinity chromatography
Amino acid LigandGlobal Interaction conditionsOrder of favoured interactionsGeneral ApplicationReference
HisHigh salt concentration to promote nucleic acids retentionRNA–sc pDNA–gDNA–oc pDNA Isolation of pDNA isoformsSousa et al. Sousa et al.
 Purification of sc pDNA from lysate
ArgMild conditions; Nucleic acids bind to the matrix at low salt concentrationsc pDNA–RNA–gDNA–oc pDNA Isolation of pDNA isoformsSousa et al. Sousa et al.
 Purification of sc pDNA from lysate
LysMild conditions; nucleic acids bind to the matrix at low salt concentrationsc pDNA–oc pDNA Isolation of pDNA isoformsSousa et al. (2009a)

The impact of plasmid topology on transfection and expression efficacy has received some attention in the past, because this represents a crucial issue in therapeutic results achieved by gene therapy or DNA vaccination. Supercoiled pDNA is often considered more effective at transferring gene expression than open circular (oc) and linear variants (Cupillard et al., 2005), explaining the interest in the isolation of this plasmid isoform. In addition, the supercoiled isoform is often associated to circular DNA found in bacteria, however supercoiling is not restricted to small, circular DNAs but also occurs in linear, eukaryotic DNA being important for several biological processes. For example, negative supercoiling plays a key role in the DNA organization in chromosomes, as discussed above. Furthermore, because negatively supercoiled DNA is underwound, it exerts a force that helps separate the two strands of the helix, which is required during both replication and transcription (Clark, 2005). As previously described, a number of protein–DNA complexes only occur because the binding is promoted by negative supercoiling due to stabilization of writhing (Palecek et al., 2004). Recent studies suggest a possible relation between the p53 function and the DNA topology. Palecek et al. (2004) hypothesized that the DNA supercoiling degree may play a significant role in the complex p53-regulatory network.

Several chromatographic methodologies, such as size-exclusion, ion-exchange, hydrophobic interaction and affinity have already been applied, either as an isolated step or integrated in an overall purification process (Diogo et al., 2005). The potential of affinity chromatography is related with its unique way to use a specific binding agent to analyse or purify biomolecules on the basis of their biological function or individual chemical structure (Kanoun et al., 1986; Lowe et al., 2001). Therefore, the affinity technique recently developed to purify plasmids allied the natural occurrence of interactions between proteins and nucleic acids with the use of simpler and more stable ligands as the amino acids instead of using complex proteins (Sousa et al., 2008a). The idea of using amino acids was also supported by the computational models already published explaining the atomic and molecular forces that are predictable to occur between different amino acids and the different regions of nucleotides (Luscombe et al., 2001; Hoffman et al., 2004).

The screening of the ability of different amino acids such as Leu, Gln, Asp, His, Arg and Lys to isolate sc pDNA showed that this biomolecule presents a selective interaction with His (Sousa et al., 2005), Arg (Sousa et al., 2008b) and Lys (Sousa et al., 2009a). The application of His as ligand revealed a different recognition for the different nucleic acids, being the structure and topology of these biomolecules one of the major issues influencing the interaction. In general, a stronger interaction occurs with RNA, what can be explained by the single strand structure of this molecule that favoured the exposure and access to the nucleotide bases. Otherwise, the genomic DNA (gDNA) was less retained, revealing a weakened affinity to the His. This could be due to the double stranded structure of this biomolecule, which causes the coverage of the contact surfaces (Sousa et al., 2006). When comparing the retention pattern of pDNA it was described that it is mainly dependent on the conformation of plasmid molecules. These achievements described the preferential interaction of His with the supercoiled isoform (Sousa et al., 2005). The oc isoform did not specifically interact with His and the retention profiles of oc pDNA and gDNA were similar. The different interactions allowed the application of the His amino acid as a specific ligand to develop an affinity support. This recognition provided baseline separation of sc and oc pDNA isoforms, using a decreasing ammonium sulphate gradient (Sousa et al., 2005). In addition, the applicability of His affinity chromatography to efficiently purify sc pDNA from host impurities present in a clarified E. coli lysate was further demonstrated (Sousa et al., 2006). As detailed in the previous section, the interaction of His with the DNA bases may include hydrogen bonding, ring stacking/hydrophobic interactions and water mediated H-bonds, and the mechanism behind the specific interaction with the bases of sc pDNA was explained as a consequence of deformations induced by torsional strain (Sousa et al., 2007a). Similar to what has been described in the cellular mechanisms section about the influence of DNA structure in the protein–DNA interactions, in this case the topology of the pDNA also directly affected the interaction with amino acids and thus, the higher exposure degree of sc bases favoured the interaction with the His ligand. The influence of the plasmid topology in the interaction with His (Sousa et al., 2007b) was also described in a study where the plasmid retention was tested at different temperature conditions. It was verified that, with increase in the temperature, the plasmid suffered pre-denaturation conformational changes which promote the removal of negative superhelical turns in sc pDNA molecules. As a consequence, it is verified a decrease in the interaction of DNA bases with the His ligands (Sousa et al., 2007a). A fundamental study performed with synthetic oligonucleotides showed that the presence of secondary structures on polyA and polyG oligonucleotides has a significant influence on retention, similar to what is obtained with plasmids. In this study, it was also verified that His interacts preferentially with the guanine and adenine bases (Sousa et al., 2009b), as described at atomic level (Luscombe et al., 2001; Hoffman et al., 2004).

The similar studies performed using Arg (Sousa et al., 2008b) and Lys (Sousa et al., 2009a) chromatography to purify pDNA, also proved the presence of a specific interaction with plasmid molecules, and a particular recognition of sc isoform. Considering the previous discussion resulting from the analysis of the protein–DNA interactions at atomic level, the interaction occurring between pDNA and Arg support was supposed to be rather complex, because apart from the stability promoted by the interaction with pDNA backbone, the ability of the Arg-agarose to distinguish and differentially interact with both isoforms suggested a specific recognition of the sc pDNA isoform. The characteristics of Arg, namely its ability to interact in different conformations, the length of its side chain and the ability to produce good hydrogen bonding geometries also point to the possibility of specific recognition mechanisms (Luscombe et al., 2001; Hoffman et al., 2004). The investigation of the retention mechanism performed with synthetic oligonucleotides, showed that although the electrostatic interaction plays an important role on the retention of single-stranded oligonucleotides, the interaction of double-stranded oligonucleotides on Arg support significantly decreased, as a result of the diminished exposure degree of bases (Sousa et al., 2009c). In this way, it was considered that the superhelicity of sc pDNA favours multiple-contact, complex interactions with Arg. Furthermore, the multiple interactions that Arg-based matrix was able to promote, allowed the differential recognition of the biomolecules present in E. coli lysates, representing an important insight to pDNA purification process (Sousa et al., 2009d). Furthermore, as the recognition of pDNA by Arg is stronger than the achieved with His, it was possible to use mild chromatographic conditions, representing an advantage for the final plasmid integrity and stability to be therapeutically applied. An interesting finding related to the interaction between Arg and nucleic acids was that the isolated RNA presented a relatively strong interaction with Arg. However, if a more complex extract, containing RNA and pDNA, is analysed the interaction of RNA with the Arg decreases and a preferential interaction is obtained with sc plasmid (Sousa et al., 2009d). The evaluation of the efficiency of the His- and Arg-based chromatography to purify pDNA was completed by performing a quality control of the plasmid product recovered. Thus, the plasmid product obtained with these affinity purification methodologies was characterized, being assured its quality by means of reduced gDNA and endotoxins to acceptable levels and no detection of RNA and proteins (Sousa et al., 2006; Sousa et al., 2009d). Transfection experiments with purified pDNA also confirmed an efficient expression, around 50–65%, of the target gene in eukaryotic cells, an interesting result concerning the therapeutic applications.

CONCLUSIONS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. CELLULAR PROTEIN–DNA COMPLEXES
  5. PROTEIN–DNA INTERACTIONS
  6. AMINO ACID–NUCLEIC ACIDS INTERACTION IN AFFINITY CHROMATOGRAPHY
  7. CONCLUSIONS
  8. REFERENCES

In this review, the protein–DNA complexes and the interactions involved at different levels were discussed. The data presented revealed that the idea to develop an affinity chromatography approach based on interactions with natural occurrence, between amino acids and nucleic acids, aiming the purification of the biologically active sc plasmid isoform can be accomplished. Furthermore, the recognition phenomena observed in many cellular processes are explained by a network of molecular interactions that confer affinity and specificity to the protein–DNA complexes. The exploitation of these affinity interactions can also improve different analytical techniques relevant in widespread approaches.

Based on the currently available data, this review also presented the specific interactions between amino acids and nucleotides moieties responsible for the molecular recognition, in particular, hydrogen bonding, electrostatic interactions, van der Waals and hydrophobic contacts, water-mediated bonds, cation–π and stacking interactions.

Generally, hydrogen bonds display the strongest specificity. It is interpreted in terms of directionality of hydrogen bonding and it is fundamental for specific recognition. These include the interactions of Arg, Lys, His and Ser with G, Asn and Gln with A. In addition, amino acid residues with more than one side-chain hydrogen-bonding atoms can produce bidentate or bifurcated H-bonds and complex H-bonds with multiple bases simultaneously; for example Arg and Lys with two adjacent bases presenting acceptor atoms enabling specificity. Electrostatic interactions between positively charged amino acids and phosphate oxygen are also important for the stabilization of complexes, as well as hydrophobic contacts between sugar atoms and aliphatic or aromatic side-chain amino acid atoms. It is also noted that there are favourable amino acid–base-pairings using van der Waals contacts, water-mediated interactions and cation–π interactions playing an important role in molecular recognition processes.

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. CELLULAR PROTEIN–DNA COMPLEXES
  5. PROTEIN–DNA INTERACTIONS
  6. AMINO ACID–NUCLEIC ACIDS INTERACTION IN AFFINITY CHROMATOGRAPHY
  7. CONCLUSIONS
  8. REFERENCES