Get access

Optimal tracking agent: a new framework of reinforcement learning for multiagent systems

Authors

  • Weihua Cao,

    1. Institute of Advanced Control and Intelligent Automation, School of Information Science and Engineering, Central South University, Changsha, China
    Search for more papers by this author
  • Gang Chen,

    1. Institute of Advanced Control and Intelligent Automation, School of Information Science and Engineering, Central South University, Changsha, China
    Search for more papers by this author
  • Xin Chen,

    1. Institute of Advanced Control and Intelligent Automation, School of Information Science and Engineering, Central South University, Changsha, China
    Search for more papers by this author
  • Min Wu

    Corresponding author
    • Institute of Advanced Control and Intelligent Automation, School of Information Science and Engineering, Central South University, Changsha, China
    Search for more papers by this author

  • The initial work was published in the Proceedings of the 6th International Conference on Frontier of Computer Science and Technology (FCST2011), pp. 1328–1334, November 16–19, 2011, Changsha, China, and received the Best Paper Award.

Correspondence to: Min Wu, Institute of Advanced Control and Intelligent Automation, School of Information Science and Engineering, Central South University, Changsha, 410083, China.

E-mail: min@csu.edu.cn

SUMMARY

The curse of dimensionality is a ubiquitous problem for multiagent reinforcement learning, which means the learning and storing space grows exponentially with the number of agents and hinders the application of multiagent reinforcement learning. To relieve this problem, we propose a new framework named as optimal tracking agent (OTA). The OTA views the other agents as part of the environment and uses a reduced form to learn the optimal decision. Although merging other agents into the environment may reduce the dimension of action space, the environment characterized by such form is dynamic and does not satisfy the convergence of reinforcement learning (RL). Thus, we develop an estimator to track the dynamics of the environment. The estimator obtains the dynamic model, and then the model-based RL can be used to react to the dynamic environment optimally. Because the Q-function in OTA is also a dynamic process because of other agents’ dynamics, different from traditional RL, in which the learning is a stationary process and the usual action selection mechanisms just suit to such stationary process, we improve the greedy action selection mechanism to adapt to such dynamics. Thus, the OTA will have convergence. An experiment illustrates the validity and efficiency of the OTA.Copyright © 2012 John Wiley & Sons, Ltd.

Get access to the full text of this article

Ancillary