• Open Access

Uncertainty in action-value estimation affects both action choice and learning rate of the choice behaviors of rats

Authors

  • Akihiro Funamizu,

    1. JSPS Research Fellow, Ichibancho 8, Chiyoda-ku, Tokyo 102-8472, Japan
    2. Neural Computation Unit, Okinawa Institute of Science and Technology, 1919-1 Tancha, Onna-son, Kunigami, Okinawa 904-0412, Japan
    3. Graduate School of Information Science and Technology, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, Japan
    Search for more papers by this author
  • Makoto Ito,

    1. Neural Computation Unit, Okinawa Institute of Science and Technology, 1919-1 Tancha, Onna-son, Kunigami, Okinawa 904-0412, Japan
    Search for more papers by this author
  • Kenji Doya,

    1. Neural Computation Unit, Okinawa Institute of Science and Technology, 1919-1 Tancha, Onna-son, Kunigami, Okinawa 904-0412, Japan
    Search for more papers by this author
  • Ryohei Kanzaki,

    1. Graduate School of Information Science and Technology, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, Japan
    2. Research Center for Advanced Science and Technology, The University of Tokyo, Komaba 4-6-1, Meguro-ku, Tokyo 153-8904, Japan
    Search for more papers by this author
  • Hirokazu Takahashi

    1. Graduate School of Information Science and Technology, The University of Tokyo, Hongo 7-3-1, Bunkyo-ku, Tokyo 113-8656, Japan
    2. Research Center for Advanced Science and Technology, The University of Tokyo, Komaba 4-6-1, Meguro-ku, Tokyo 153-8904, Japan
    3. PRESTO, JST, 4-1-8 Honcho Kawaguchi, Saitama 332-0012, Japan
    Search for more papers by this author

Hirokazu Takahashi, 4Research Center for Advanced Science and Technology, as above.
E-mail: takahashi@i.u-tokyo.ac.jp
Akihiro Funamizu, 1JSPS Research Fellow, as above. E-mail: funamizu@oist.jp

Abstract

The estimation of reward outcomes for action candidates is essential for decision making. In this study, we examined whether and how the uncertainty in reward outcome estimation affects the action choice and learning rate. We designed a choice task in which rats selected either the left-poking or right-poking hole and received a reward of a food pellet stochastically. The reward probabilities of the left and right holes were chosen from six settings (high, 100% vs. 66%; mid, 66% vs. 33%; low, 33% vs. 0% for the left vs. right holes, and the opposites) in every 20–549 trials. We used Bayesian Q-learning models to estimate the time course of the probability distribution of action values and tested if they better explain the behaviors of rats than standard Q-learning models that estimate only the mean of action values. Model comparison by cross-validation revealed that a Bayesian Q-learning model with an asymmetric update for reward and non-reward outcomes fit the choice time course of the rats best. In the action-choice equation of the Bayesian Q-learning model, the estimated coefficient for the variance of action value was positive, meaning that rats were uncertainty seeking. Further analysis of the Bayesian Q-learning model suggested that the uncertainty facilitated the effective learning rate. These results suggest that the rats consider uncertainty in action-value estimation and that they have an uncertainty-seeking action policy and uncertainty-dependent modulation of the effective learning rate.

Ancillary