M.K. and A.B.M. contributed equally to this work.
Anticipatory reward signals in ventral striatal neurons of behaving rats
Article first published online: 23 OCT 2008
© The Authors (2008). Journal Compilation © Federation of European Neuroscience Societies and Blackwell Publishing Ltd
European Journal of Neuroscience
Volume 28, Issue 9, pages 1849–1866, November 2008
How to Cite
Khamassi, M., Mulder, A. B., Tabuchi, E., Douchamps, V. and Wiener, S. I. (2008), Anticipatory reward signals in ventral striatal neurons of behaving rats. European Journal of Neuroscience, 28: 1849–1866. doi: 10.1111/j.1460-9568.2008.06480.x
- Issue published online: 27 OCT 2008
- Article first published online: 23 OCT 2008
- Received 12 June 2008, revised 12 August 2008, accepted 2 September 2008
- reinforcement learning;
- TD learning
It has been proposed that the striatum plays a crucial role in learning to select appropriate actions, optimizing rewards according to the principles of ‘Actor–Critic’ models of trial-and-error learning. The ventral striatum (VS), as Critic, would employ a temporal difference (TD) learning algorithm to predict rewards and drive dopaminergic neurons. This study examined this model’s adequacy for VS responses to multiple rewards in rats. The respective arms of a plus-maze provided rewards of varying magnitudes; multiple rewards were provided at 1-s intervals while the rat stood still. Neurons discharged phasically prior to each reward, during both initial approach and immobile waiting, demonstrating that this signal is predictive and not simply motor-related. In different neurons, responses could be greater for early, middle or late droplets in the sequence. Strikingly, this activity often reappeared after the final reward, as if in anticipation of yet another. In contrast, previous TD learning models show decremental reward-prediction profiles during reward consumption due to a temporal-order signal introduced to reproduce accurate timing in dopaminergic reward-prediction error signals. To resolve this inconsistency in a biologically plausible manner, we adapted the TD learning model such that input information is nonhomogeneously distributed among different neurons. By suppressing reward temporal-order signals and varying richness of spatial and visual input information, the model reproduced the experimental data. This validates the feasibility of a TD-learning architecture where different groups of neurons participate in solving the task based on varied input information.