Proteopedia: 3D visualization and annotation of transcription factor–DNA readout modes

Authors

  • Ana Carolina Dantas Machado,

    1. Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, and Physics and Astronomy, University of Southern California, 1050 Childs Way, Los Angeles, California 90089
    Search for more papers by this author
  • Skyler B. Saleebyan,

    1. La Cañada High School, 4463 Oak Grove Drive, La Cañada Flintridge, California 91011
    Search for more papers by this author
    • Skyler B. Saleebyan, Bailey T. Holmes, Maria Karelina, Julia Tam, Sharon Y. Kim, and Keziah H. Kim contributed equally to this work.

  • Bailey T. Holmes,

    1. La Cañada High School, 4463 Oak Grove Drive, La Cañada Flintridge, California 91011
    Search for more papers by this author
    • Skyler B. Saleebyan, Bailey T. Holmes, Maria Karelina, Julia Tam, Sharon Y. Kim, and Keziah H. Kim contributed equally to this work.

  • Maria Karelina,

    1. La Cañada High School, 4463 Oak Grove Drive, La Cañada Flintridge, California 91011
    Search for more papers by this author
    • Skyler B. Saleebyan, Bailey T. Holmes, Maria Karelina, Julia Tam, Sharon Y. Kim, and Keziah H. Kim contributed equally to this work.

  • Julia Tam,

    1. La Cañada High School, 4463 Oak Grove Drive, La Cañada Flintridge, California 91011
    Search for more papers by this author
    • Skyler B. Saleebyan, Bailey T. Holmes, Maria Karelina, Julia Tam, Sharon Y. Kim, and Keziah H. Kim contributed equally to this work.

  • Sharon Y. Kim,

    1. La Cañada High School, 4463 Oak Grove Drive, La Cañada Flintridge, California 91011
    Search for more papers by this author
    • Skyler B. Saleebyan, Bailey T. Holmes, Maria Karelina, Julia Tam, Sharon Y. Kim, and Keziah H. Kim contributed equally to this work.

  • Keziah H. Kim,

    1. Flintridge Preparatory School, 4543 Crown Avenue, La Cañada Flintridge, California 91011
    Search for more papers by this author
    • Skyler B. Saleebyan, Bailey T. Holmes, Maria Karelina, Julia Tam, Sharon Y. Kim, and Keziah H. Kim contributed equally to this work.

  • Iris Dror,

    1. Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, and Physics and Astronomy, University of Southern California, 1050 Childs Way, Los Angeles, California 90089
    Search for more papers by this author
  • Eran Hodis,

    1. Harvard/MIT MD-PhD Program, Harvard Medical School, 25 Shattuck Street, Boston, Massachusetts 02115
    Search for more papers by this author
  • Eric Martz,

    1. Department of Microbiology, University of Massachusetts, 639 North Pleasant Street, Amherst, Massachusetts 01003
    Search for more papers by this author
  • Patricia A. Compeau,

    1. La Cañada High School, 4463 Oak Grove Drive, La Cañada Flintridge, California 91011
    Search for more papers by this author
  • Remo Rohs

    Corresponding author
    1. Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, and Physics and Astronomy, University of Southern California, 1050 Childs Way, Los Angeles, California 90089
    • Molecular and Computational Biology Program, Departments of Biological Sciences, Chemistry, and Physics and Astronomy, University of Southern California, 1050 Childs Way, Los Angeles, California 90089
    Search for more papers by this author

Abstract

3D visualization assists in identifying diverse mechanisms of protein-DNA recognition that can be observed for transcription factors and other DNA binding proteins. We used Proteopedia to illustrate transcription factor-DNA readout modes with a focus on DNA shape, which can be a function of either nucleotide sequence (Hox proteins) or base pairing geometry (p53). © 2012 by The International Union of Biochemistry and Molecular Biology

Proteopedia is an online encyclopedia designed to illustrate the three-dimensional (3D) structures of proteins, nucleic acids, and their complexes [1, 2]. To better describe protein–DNA binding at the molecular level, we created two Proteopedia pages (http://proteopedia.org/w/Hox-DNA_Recognition and http://proteopedia.org/w/P53-DNA_Recognition), where we annotated and visualized key interactions at protein–protein and protein–DNA interfaces within protein–DNA complexes for the Hox and p53 transcription factors (Fig. 1). The p53 tumor suppressor plays an instrumental role in the body's defense against stress and cancer by promoting distinct biological responses, while Hox proteins are crucial transcription factors that regulate the development of distinct segments of the embryo.

Figure 1.

The crystal structures of a Hox-Exd-DNA ternary complex (PDB ID 2R5Z) and a complex of a p53 DNA-binding-domain tetramer and its DNA target (PDB ID 3KZ8) form distinct protein–protein and protein–DNA interactions. Hox proteins bind DNA with a cofactor (Exd) while p53 forms a tetramer. Whereas both systems use the mechanisms of base and shape readout, DNA shape is a sequence-dependent feature in Hox binding sites and is modulated through the alteration of the base pairing geometry in a p53 consensus site.

Protein–DNA readout modes have been traditionally classified into “direct and indirect readout,” terms coined when the very first crystal structures of protein–DNA complexes were solved and analyzed. Direct readout originally referred to hydrogen bonds between amino acids and bases, and indirect readout to water-mediated hydrogen bonds [3]. Since then, the 3D structures of thousands of protein–DNA complexes have become available, shedding light on many variations of protein–DNA recognition. Different protein families bind to DNA with various levels of specificity, from non-specific to highly specific, which can be achieved through a variety of means [3]. Some proteins bind alone, while others recruit cofactors or bind as dimers or tetramers. Proteins can contact the DNA either in the major or minor groove, or bind to both grooves at once. Some proteins leave the intrinsic DNA structure essentially intact, while others drastically deform the DNA. The term direct readout is not restricted to sequence-specific hydrogen bonds or hydrophobic contacts with functional groups of bases. Indirect readout, in turn, has become a catch-all term that is poorly defined. Thus, a classification into the two categories, direct and indirect readout, is simply not extensive enough to reflect our current knowledge of thousands of structures.

To assist researchers and educators, we have recently proposed a new classification scheme for protein–DNA readout modes into the more descriptive categories of “base and shape readout,” which allow subsequent sub-classifications such as local and global shape readout, and major and minor groove base readout [3, 4]. We illustrated that Drosophila Hox proteins recruit a cofactor, Exd (Extradenticle), to achieve in vivo specificity, a mechanism that we call latent specificity [5]. In comparison to Hox proteins, human p53 binds to DNA as a tetramer, and thereby engages in cooperative protein–protein interactions [6]. Both the Hox and p53 transcription factors engage in hydrogen bonding between protein side chains and major groove edges of the base pairs. In particular, bidentate hydrogen bonds between arginine side chains and guanine bases contribute to highly specific sequence readout. Similarly, both transcription factors use shape readout of minor groove geometry and electrostatic potential to enhance binding specificity [7]. However, the mechanisms through which the shapes of their binding sites are modulated differ (Fig. 1). Hox binding sites are recognized based on their sequence-dependent minor groove shape [8]. In contrast, it has been observed that the DNA targets of p53 alter their shape through a transition of some base pairs from Watson–Crick to Hoogsteen geometry [6]. It still remains to be seen, however, if the Hoogsteen geometry is recognized by p53 as an intrinsic feature of DNA or if it is induced through the formation of the complex [9].

Protein structures have been classified in much detail over the years, while the same cannot be said for protein–DNA recognition. Under the new classification scheme, which was already mentioned in a textbook [10], we are now able to analyze protein–DNA readout modes in more detail and integrate them using 3D visualization to better understand molecular interactions. Proteopedia provides the framework for utilizing intricate details of protein–DNA recognition in both research and education.

Acknowledgements

The authors acknowledge advice and help by Proteopedia editors J. Prilusky and J.L. Sussman from the Weizmann Institute of Science in Rehovot, Israel. This work is a result of the Bioinformatics Institute, which was established in 2011 at La Cañada High School as one of its Institutes of the 21st Century in partnership with the Rohs laboratory at the University of Southern California. This project was supported by USC start-up funds.

Ancillary