Get access

Anticipatory reward signals in ventral striatal neurons of behaving rats

Authors

  • Mehdi Khamassi,

    1. Laboratoire de Physiologie de la Perception et de l’Action, Collège de France, CNRS, 11 pl. Marcelin Berthelot, 75231 Paris Cedex 05, France
    2. ISIR, Université Pierre et Marie Curie – Paris 6, 75016 Paris, France
    Search for more papers by this author
    • *

      M.K. and A.B.M. contributed equally to this work.

  • Antonius B. Mulder,

    1. Laboratoire de Physiologie de la Perception et de l’Action, Collège de France, CNRS, 11 pl. Marcelin Berthelot, 75231 Paris Cedex 05, France
    Search for more papers by this author
    • *

      M.K. and A.B.M. contributed equally to this work.

    • Present address: Cognitive Neurophysiology-CNCR, Department of Anatomy and Neurosciences, VU University Medical Center, van de Boechorststraat 7, 1081 BT, Amsterdam, The Netherlands.

  • Eiichi Tabuchi,

    1. Laboratoire de Physiologie de la Perception et de l’Action, Collège de France, CNRS, 11 pl. Marcelin Berthelot, 75231 Paris Cedex 05, France
    Search for more papers by this author
    • Present address: Department of Analysis of Brain Function, Faculty of Food Nutrition, Toyama College, 444 Gankaiji, Toyama 930-0193, Japan.

  • Vincent Douchamps,

    1. Laboratoire de Physiologie de la Perception et de l’Action, Collège de France, CNRS, 11 pl. Marcelin Berthelot, 75231 Paris Cedex 05, France
    Search for more papers by this author
  • Sidney I. Wiener

    1. Laboratoire de Physiologie de la Perception et de l’Action, Collège de France, CNRS, 11 pl. Marcelin Berthelot, 75231 Paris Cedex 05, France
    Search for more papers by this author

Dr S. I. Wiener, as above.
E-mail: sidney.wiener@college-de-france.fr

Abstract

It has been proposed that the striatum plays a crucial role in learning to select appropriate actions, optimizing rewards according to the principles of ‘Actor–Critic’ models of trial-and-error learning. The ventral striatum (VS), as Critic, would employ a temporal difference (TD) learning algorithm to predict rewards and drive dopaminergic neurons. This study examined this model’s adequacy for VS responses to multiple rewards in rats. The respective arms of a plus-maze provided rewards of varying magnitudes; multiple rewards were provided at 1-s intervals while the rat stood still. Neurons discharged phasically prior to each reward, during both initial approach and immobile waiting, demonstrating that this signal is predictive and not simply motor-related. In different neurons, responses could be greater for early, middle or late droplets in the sequence. Strikingly, this activity often reappeared after the final reward, as if in anticipation of yet another. In contrast, previous TD learning models show decremental reward-prediction profiles during reward consumption due to a temporal-order signal introduced to reproduce accurate timing in dopaminergic reward-prediction error signals. To resolve this inconsistency in a biologically plausible manner, we adapted the TD learning model such that input information is nonhomogeneously distributed among different neurons. By suppressing reward temporal-order signals and varying richness of spatial and visual input information, the model reproduced the experimental data. This validates the feasibility of a TD-learning architecture where different groups of neurons participate in solving the task based on varied input information.

Ancillary