Standard Article

Reduction of a POMDP to an MDP

  1. Burhaneddin Sandikçi

Published Online: 15 JUN 2010

DOI: 10.1002/9780470400531.eorms0710

Wiley Encyclopedia of Operations Research and Management Science

Wiley Encyclopedia of Operations Research and Management Science

How to Cite

Sandikçi, B. 2010. Reduction of a POMDP to an MDP. Wiley Encyclopedia of Operations Research and Management Science. .

Author Information

  1. University of Chicago, Booth School of Business, Chicago, Illinois

Publication History

  1. Published Online: 15 JUN 2010

Abstract

A partially observable Markov decision process (POMDP) is an appropriate mathematical modeling tool for dynamic stochastic systems where portions or all of the system states are not completely observable to the decision maker. In this respect, POMDPs generalize completely observable Markov decision processes (MDPs) by allowing infinitely many states to address partial observability. However, the resulting models suffer tremendously from computational intractability even for relatively small problems. Therefore, POMDPs are frequently approximated by solving variants of completely observable MDPs defined on a grid of finite states. This article summarizes the relationships between completely and partially observable MDPs and derives inequalities for the POMDP value function using the optimal value function of the grid-based MDPs.

Keywords:

  • MDP;
  • POMDP;
  • grid-based approximation;
  • linear programming