• clustering;
  • expectation maximization;
  • EM;
  • Gaussian mixture models;
  • GMM;
  • transcription factor binding site;
  • DNA motif


Identification of transcription factor binding sites still remains a challenging problem even though many computational tools have been proposed in the literature for this specific task. In this study, a method to discover such DNA subsequences, that is, motifs, is proposed. The method uses Gaussian mixture models with expectation-maximization algorithm. In order to show the potential of the proposed method, experiments are conducted by use of data sets extracted from the DNA sequences of various organisms. The proposed method is also compared with four other methods: MEME, MDScan, SOMBRERO and the fuzzy C-means based motif finder. As a result, the proposed method proves itself as a promising tool in identifying over-represented DNA motifs.