Proteopedia is an online encyclopedia designed to illustrate the three-dimensional (3D) structures of proteins, nucleic acids, and their complexes [1, 2]. To better describe protein–DNA binding at the molecular level, we created two Proteopedia pages (http://proteopedia.org/w/Hox-DNA_Recognition and http://proteopedia.org/w/P53-DNA_Recognition), where we annotated and visualized key interactions at protein–protein and protein–DNA interfaces within protein–DNA complexes for the Hox and p53 transcription factors (Fig. 1). The p53 tumor suppressor plays an instrumental role in the body's defense against stress and cancer by promoting distinct biological responses, while Hox proteins are crucial transcription factors that regulate the development of distinct segments of the embryo.
Protein–DNA readout modes have been traditionally classified into “direct and indirect readout,” terms coined when the very first crystal structures of protein–DNA complexes were solved and analyzed. Direct readout originally referred to hydrogen bonds between amino acids and bases, and indirect readout to water-mediated hydrogen bonds . Since then, the 3D structures of thousands of protein–DNA complexes have become available, shedding light on many variations of protein–DNA recognition. Different protein families bind to DNA with various levels of specificity, from non-specific to highly specific, which can be achieved through a variety of means . Some proteins bind alone, while others recruit cofactors or bind as dimers or tetramers. Proteins can contact the DNA either in the major or minor groove, or bind to both grooves at once. Some proteins leave the intrinsic DNA structure essentially intact, while others drastically deform the DNA. The term direct readout is not restricted to sequence-specific hydrogen bonds or hydrophobic contacts with functional groups of bases. Indirect readout, in turn, has become a catch-all term that is poorly defined. Thus, a classification into the two categories, direct and indirect readout, is simply not extensive enough to reflect our current knowledge of thousands of structures.
To assist researchers and educators, we have recently proposed a new classification scheme for protein–DNA readout modes into the more descriptive categories of “base and shape readout,” which allow subsequent sub-classifications such as local and global shape readout, and major and minor groove base readout [3, 4]. We illustrated that Drosophila Hox proteins recruit a cofactor, Exd (Extradenticle), to achieve in vivo specificity, a mechanism that we call latent specificity . In comparison to Hox proteins, human p53 binds to DNA as a tetramer, and thereby engages in cooperative protein–protein interactions . Both the Hox and p53 transcription factors engage in hydrogen bonding between protein side chains and major groove edges of the base pairs. In particular, bidentate hydrogen bonds between arginine side chains and guanine bases contribute to highly specific sequence readout. Similarly, both transcription factors use shape readout of minor groove geometry and electrostatic potential to enhance binding specificity . However, the mechanisms through which the shapes of their binding sites are modulated differ (Fig. 1). Hox binding sites are recognized based on their sequence-dependent minor groove shape . In contrast, it has been observed that the DNA targets of p53 alter their shape through a transition of some base pairs from Watson–Crick to Hoogsteen geometry . It still remains to be seen, however, if the Hoogsteen geometry is recognized by p53 as an intrinsic feature of DNA or if it is induced through the formation of the complex .
Protein structures have been classified in much detail over the years, while the same cannot be said for protein–DNA recognition. Under the new classification scheme, which was already mentioned in a textbook , we are now able to analyze protein–DNA readout modes in more detail and integrate them using 3D visualization to better understand molecular interactions. Proteopedia provides the framework for utilizing intricate details of protein–DNA recognition in both research and education.