Bayesian non-parametric hidden Markov models with applications in genomics


O. Papaspiliopoulos, Department of Economics, Universitat Pompeu Fabra, Ramon Trias Fargas 25–27, Barcelona 8005, Spain.


Summary.  We propose a flexible non-parametric specification of the emission distribution in hidden Markov models and we introduce a novel methodology for carrying out the computations. Whereas current approaches use a finite mixture model, we argue in favour of an infinite mixture model given by a mixture of Dirichlet processes. The computational framework is based on auxiliary variable representations of the Dirichlet process and consists of a forward–backward Gibbs sampling algorithm of similar complexity to that used in the analysis of parametric hidden Markov models. The algorithm involves analytic marginalizations of latent variables to improve the mixing, facilitated by exchangeability properties of the Dirichlet process that we uncover in the paper. A by-product of this work is an efficient Gibbs sampler for learning Dirichlet process hierarchical models. We test the Monte Carlo algorithm proposed against a wide variety of alternatives and find significant advantages. We also investigate by simulations the sensitivity of the proposed model to prior specification and data-generating mechanisms. We apply our methodology to the analysis of genomic copy number variation. Analysing various real data sets we find significantly more accurate inference compared with state of the art hidden Markov models which use finite mixture emission distributions.