Genome-wide Inference of Transcription Factor–DNA Binding Specificity in Cell Regeneration Using a Combination Strategy

Authors

  • Xiaofeng Wang,

    1. Institute of Hepatobiliary Surgery, Southwest Hospital, Third Military Medical University, Chongqing 400010, China
    2. Department of General Surgery, CAPF General Hospital 100039, Beijing, China
    Search for more papers by this author
  • Aiqun Zhang,

    1. Department of Hepatobiliary Surgery, PLA General Hospital 100853, Beijing, China
    Search for more papers by this author
  • Weizheng Ren,

    1. Department of Hepatobiliary Surgery, PLA General Hospital 100853, Beijing, China
    Search for more papers by this author
  • Caiyu Chen,

    1. Institute of Hepatobiliary Surgery, Southwest Hospital, Third Military Medical University, Chongqing 400010, China
    Search for more papers by this author
  • Jiahong Dong

    Corresponding author
    1. Institute of Hepatobiliary Surgery, Southwest Hospital, Third Military Medical University, Chongqing 400010, China
    2. Department of Hepatobiliary Surgery, PLA General Hospital 100853, Beijing, China
      Corresponding author: Jiahong Dong, dongjh301@163.com
    Search for more papers by this author

Corresponding author: Jiahong Dong, dongjh301@163.com

Abstract

The cell growth, development, and regeneration of tissue and organ are associated with a large number of gene regulation events, which are mediated in part by transcription factors (TFs) binding to cis-regulatory elements involved in the genome. Predicting the binding affinity and inferring the binding specificity of TF–DNA interactions at the genomic level would be fundamentally helpful for our understanding of the molecular mechanism and biological implication underlying sequence-specific TF–DNA recognition. In this study, we report the development of a combination method to characterize the interaction behavior of a 11-mer oligonucleotide segment and its mutations with the Gcn4p protein, a homodimeric, basic leucine zipper TF, and to predict the binding affinity and specificity of potential Gcn4p binders in the genome-wide scale. In this procedure, a position-mutated energy matrix is created based on molecular modeling analysis of native and mutated Gcn4p–DNA complex structures to describe the position-independent interaction energy profile of Gcn4p with different nucleotide types at each position of the oligonucleotide, and the energy terms extracted from the matrix and their interactives are then correlated with experimentally measured affinities of 19 268 distinct oligonucleotides using statistical modeling methodology. Subsequently, the best one of built regression models is successfully applied to screen those of potential high-affinity Gcn4p binders from the complete genome. The findings arising from this study are briefly listed below: (i) The 11 positions of oligonucleotides are highly interactive and non-additive in contribution to Gcn4p–DNA binding affinity; (ii) Indirect conformational effects upon nucleotide mutations as well as associated subtle changes in interfacial atomic contacts, but not the direct nonbonded interactions, are primarily responsible for the sequence-specific recognition; (iii) The intrinsic synergistic effects among the sequence positions of oligonucleotides determine Gcn4p–DNA binding affinity and specificity; (iv) Linear regression models in conjunction with variable selection seem to perform fairly well in capturing the internal dependences hidden in the Gcn4p–DNA system, albeit ignoring nonlinear factors may lead the models to systematically underestimate and overestimate high- and low-affinity samples, respectively.

Ancillary