Standard Article

Total Expected Discounted Reward MDPs: Value Iteration Algorithm

  1. Theologos Bountourelis

Published Online: 15 FEB 2011

DOI: 10.1002/9780470400531.eorms0908

Wiley Encyclopedia of Operations Research and Management Science

Wiley Encyclopedia of Operations Research and Management Science

How to Cite

Bountourelis, T. 2011. Total Expected Discounted Reward MDPs: Value Iteration Algorithm. Wiley Encyclopedia of Operations Research and Management Science. .

Author Information

  1. University of Pittsburgh, Department of Industrial Engineering, Pittsburgh, Pennsylvania

Publication History

  1. Published Online: 15 FEB 2011

Abstract

In this article, we discuss the total expected discounted reward Markov decision problem and its popular algorithmic solution, the value iteration algorithm. We formally state the problem, characterize the optimal policy, and describe the algorithm in its simplest form. We conclude with a discussion on the problem assumptions, the more advanced versions of the algorithm, and highlight the relevant literature.

Keywords:

  • markov decision processes;
  • total expected discounted reward;
  • value iteration algorithm