• biomolecular recognition;
  • cell regeneration;
  • DNA;
  • Gcn4p protein;
  • transcription factor

The cell growth, development, and regeneration of tissue and organ are associated with a large number of gene regulation events, which are mediated in part by transcription factors (TFs) binding to cis-regulatory elements involved in the genome. Predicting the binding affinity and inferring the binding specificity of TF–DNA interactions at the genomic level would be fundamentally helpful for our understanding of the molecular mechanism and biological implication underlying sequence-specific TF–DNA recognition. In this study, we report the development of a combination method to characterize the interaction behavior of a 11-mer oligonucleotide segment and its mutations with the Gcn4p protein, a homodimeric, basic leucine zipper TF, and to predict the binding affinity and specificity of potential Gcn4p binders in the genome-wide scale. In this procedure, a position-mutated energy matrix is created based on molecular modeling analysis of native and mutated Gcn4p–DNA complex structures to describe the position-independent interaction energy profile of Gcn4p with different nucleotide types at each position of the oligonucleotide, and the energy terms extracted from the matrix and their interactives are then correlated with experimentally measured affinities of 19 268 distinct oligonucleotides using statistical modeling methodology. Subsequently, the best one of built regression models is successfully applied to screen those of potential high-affinity Gcn4p binders from the complete genome. The findings arising from this study are briefly listed below: (i) The 11 positions of oligonucleotides are highly interactive and non-additive in contribution to Gcn4p–DNA binding affinity; (ii) Indirect conformational effects upon nucleotide mutations as well as associated subtle changes in interfacial atomic contacts, but not the direct nonbonded interactions, are primarily responsible for the sequence-specific recognition; (iii) The intrinsic synergistic effects among the sequence positions of oligonucleotides determine Gcn4p–DNA binding affinity and specificity; (iv) Linear regression models in conjunction with variable selection seem to perform fairly well in capturing the internal dependences hidden in the Gcn4p–DNA system, albeit ignoring nonlinear factors may lead the models to systematically underestimate and overestimate high- and low-affinity samples, respectively.