A Bayesian Hidden Markov Model for Motif Discovery Through Joint Modeling of Genomic Sequence and ChIP-Chip Data
Article first published online: 5 FEB 2009
© 2009, The International Biometric Society
Volume 65, Issue 4, pages 1087–1095, December 2009
How to Cite
Gelfond, J. A. L., Gupta, M. and Ibrahim, J. G. (2009), A Bayesian Hidden Markov Model for Motif Discovery Through Joint Modeling of Genomic Sequence and ChIP-Chip Data. Biometrics, 65: 1087–1095. doi: 10.1111/j.1541-0420.2008.01180.x
- Issue published online: 23 NOV 2009
- Article first published online: 5 FEB 2009
- Received August 2007. Revised August 2008. Accepted August 2008.
- Data augmentation;
- Gene regulation;
- Tiling array;
- Transcription factor binding site
Summary We propose a unified framework for the analysis of chromatin (Ch) immunoprecipitation (IP) microarray (ChIP-chip) data for detecting transcription factor binding sites (TFBSs) or motifs. ChIP-chip assays are used to focus the genome-wide search for TFBSs by isolating a sample of DNA fragments with TFBSs and applying this sample to a microarray with probes corresponding to tiled segments across the genome. Present analytical methods use a two-step approach: (i) analyze array data to estimate IP-enrichment peaks then (ii) analyze the corresponding sequences independently of intensity information. The proposed model integrates peak finding and motif discovery through a unified Bayesian hidden Markov model (HMM) framework that accommodates the inherent uncertainty in both measurements. A Markov chain Monte Carlo algorithm is formulated for parameter estimation, adapting recursive techniques used for HMMs. In simulations and applications to a yeast RAP1 dataset, the proposed method has favorable TFBS discovery performance compared to currently available two-stage procedures in terms of both sensitivity and specificity.