Hidden Markov Models
Published Online: 16 MAY 2011
Copyright © 2001 John Wiley & Sons, Ltd. All rights reserved.
How to Cite
Przytycka, T. M. and Zheng, J. 2011. Hidden Markov Models. eLS. .
- Published Online: 16 MAY 2011
A hidden Markov model (HMM) is a statistical approach that is frequently used for modelling biological sequences. In applying it, a sequence is modelled as an output of a discrete stochastic process, which progresses through a series of states that are ‘hidden’ from the observer. Each such hidden state emits a symbol representing an elementary unit of the modelled data, for example, in case of a protein sequence – an amino acid. The parameters of a hidden Markov model can be estimated by learning from training data. Efficient algorithms are available to infer the most likely paths of states for given sequence data, which often lead to biological predictions and interpretations. Thanks to the well developed theories and algorithms, hidden Markov models have found wide applications in diverse areas of computational molecular biology.
Hidden Markov model is a statistical approach for modelling sequences with broad applications in computational biology.
In an HMM, a biological sequence is modelled as being generated by a stochastic process moving from one state to the next state, where each state emits one element of the sequence according to some emission probability distribution which, in general, is different in different states.
Training of an HMM is a process in which the parameters of the model are computed based on a training set of representative examples.
Overfitting/overtraining the model occurs when model parameters correctly represent the training set but the model cannot generalise the training data to a larger set.
Gene finding is a process of computational identification of genes, including exon/intron structure, in a genome.
- gene finding;
- profile HMM;
- training HMM;
- protein structure prediction;