European Journal of Neuroscience

Cover image for Vol. 35 Issue 7

Special Issue: Beyond Simple Reinforcement Learning

April 2012

Volume 35, Issue 7

Pages 987–1200

  1. Beyond Simple Reinforcement Learning

    1. Top of page
    2. Beyond Simple Reinforcement Learning

      You have free access to this content
    2. Model-based learning and the contribution of the orbitofrontal cortex to the model-free world (pages 991–996)

      Michael A. McDannald, Yuji K. Takahashi, Nina Lopatina, Brad W. Pietras, Josh L. Jones and Geoffrey Schoenbaum

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2011.07982.x

      Thumbnail image of graphical abstract

      Learning is proposed to occur when there is a discrepancy between reward prediction and reward receipt. At least two separate systems are thought to exist – one in which predictions are proposed to be based on model-free or cached values and another in which predictions are model-based.

    3. Re-evaluating the role of the orbitofrontal cortex in reward and reinforcement (pages 997–1010)

      M. P. Noonan, N. Kolling, M. E. Walton and M. F. S. Rushworth

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2012.08023.x

      Thumbnail image of graphical abstract

      The orbitofrontal cortex and adjacent ventromedial prefrontal cortex carry reward representations and mediate flexible behaviour when circumstances change. Here we review how recent experiments in humans and macaques have confirmed the existence of a major difference between the functions of the ventromedial prefrontal and adjacent medial orbitofrontal cortex (vmPFC/mOFC) on the one hand and the lateral orbitofrontal cortex (lOFC) on the other.

    4. Dissociating hippocampal and striatal contributions to sequential prediction learning (pages 1011–1023)

      Aaron M. Bornstein and Nathaniel D. Daw

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2011.07920.x

      Thumbnail image of graphical abstract

      Behavior may be generated on the basis of many different kinds of learned contingencies. For instance, responses could be guided by the direct association between a stimulus and response, or by sequential stimulus–stimulus relationships (as in model-based reinforcement learning or goal-directed actions).

    5. How much of reinforcement learning is working memory, not reinforcement learning? A behavioral, computational, and neurogenetic analysis (pages 1024–1035)

      Anne G. E. Collins and Michael J. Frank

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2011.07980.x

      Thumbnail image of graphical abstract

      Instrumental learning involves corticostriatal circuitry and the dopaminergic system. This system is typically modeled in the reinforcement learning (RL) framework by incrementally accumulating reward values of states and actions.

    6. Habits, action sequences and reinforcement learning (pages 1036–1051)

      Amir Dezfouli and Bernard W. Balleine

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2012.08050.x

      Thumbnail image of graphical abstract

      It is now widely accepted that instrumental actions can be either goal-directed or habitual; whereas the former are rapidly acquired and regulated by their outcome, the latter are reflexive, elicited by antecedent stimuli rather than their consequences. Model-based reinforcement learning (RL) provides an elegant description of goal-directed action.

    7. A theoretical account of cognitive effects in delay discounting (pages 1052–1064)

      Zeb Kurth-Nelson, Warren Bickel and A. David Redish

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2012.08058.x

      Thumbnail image of graphical abstract

      Although delay discounting, the attenuation of the value of future rewards, is a robust finding, the mechanism of discounting is not known. We propose a potential mechanism for delay discounting such that discounting emerges from a search process that is trying to determine what rewards will be available in the future.

    8. Decision value computation in DLPFC and VMPFC adjusts to the available decision time (pages 1065–1074)

      Peter Sokol-Hessner, Cendri Hutcherson, Todd Hare and Antonio Rangel

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2012.08076.x

      Thumbnail image of graphical abstract

      It is increasingly clear that simple decisions are made by computing decision values for the options under consideration, and then comparing these values to make a choice. Computational models of this process suggest that it involves the accumulation of information over time, but little is known about the temporal course of valuation in the brain.

    9. Strategic control in decision-making under uncertainty (pages 1075–1082)

      Vinod Venkatraman and Scott A. Huettel

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2012.08009.x

      Thumbnail image of graphical abstract

      Complex economic decisions – whether investing money for retirement or purchasing some new electronic gadget – often involve uncertainty about the likely consequences of our choices. Critical for resolving that uncertainty are strategic meta-decision processes, which allow people to simplify complex decision problems, to evaluate outcomes against a variety of contexts, and to flexibly match behavior to changes in the environment.

    10. Category representation and generalization in the prefrontal cortex (pages 1083–1091)

      Xiaochuan Pan and Masamichi Sakagami

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2011.07981.x

      Thumbnail image of graphical abstract

      Categorization is a function of the brain that serves to group together items and events in our environments. Here we review the following important issues related to category representation and generalization: namely, where categories are presented in the brain, and how the brain utilizes categorical membership to generate new information.

    11. Generalization of value in reinforcement learning by humans (pages 1092–1104)

      G. Elliott Wimmer, Nathaniel D. Daw and Daphna Shohamy

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2012.08017.x

      Thumbnail image of graphical abstract

      Research in decision-making has focused on the role of dopamine and its striatal targets in guiding choices via learned stimulus–reward or stimulus–response associations, behavior that is well described by reinforcement learning (RL) theories. However, basic RL is relatively limited in scope and does not explain how learning about stimulus regularities or relations may guide decision-making.

    12. Different dorsal striatum circuits mediate action discrimination and action generalization (pages 1105–1114)

      Mónica Hilario, Terrell Holloway, Xin Jin and Rui M. Costa

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2012.08073.x

      Thumbnail image of graphical abstract

      Generalization is an important process that allows animals to extract rules from regularities of past experience and apply them to analogous situations. In particular, the generalization of previously learned actions to novel instruments allows animals to use past experience to act faster and more efficiently in an ever-changing environment.

    13. Neural control of dopamine neurotransmission: implications for reinforcement learning (pages 1115–1123)

      Mayank Aggarwal, Brian I. Hyland and Jeffery R. Wickens

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2012.08055.x

      Thumbnail image of graphical abstract

      In the past few decades there has been remarkable convergence of machine learning with neurobiological understanding of reinforcement learning mechanisms, exemplified by temporal difference (TD) learning models. The anatomy of the basal ganglia provides a number of potential substrates for instantiation of the TD mechanism.

    14. From prediction error to incentive salience: mesolimbic computation of reward motivation (pages 1124–1143)

      Kent C. Berridge

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2012.07990.x

      Thumbnail image of graphical abstract

      Reward contains separable psychological components of learning, incentive motivation and pleasure. Most computational models have focused only on the learning component of reward, but the motivational component is equally important in reward circuitry, and even more directly controls behavior.

    15. Decomposing effects of dopaminergic medication in Parkinson’s disease on probabilistic action selection – learning or performance? (pages 1144–1151)

      P. Smittenaar, H. W. Chase, E. Aarts, B. Nusselein, B. R. Bloem and R. Cools

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2012.08043.x

      Thumbnail image of graphical abstract

      Dopamine has long been implicated in the acquisition of stimulus-response associations, but its effect on the expression of value-based representations is unclear. These data highlight a role for dopamine in reward-driven behavioral control beyond that already established in learning and plasticity.

    16. Instrumental vigour in punishment and reward (pages 1152–1168)

      Peter Dayan

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2012.08026.x

      Thumbnail image of graphical abstract

      Recent notions about the vigour of responding in operant conditioning suggest that the long-run average rate of reward should control the alacrity of action in cases in which the actual cost of speed is balanced against the opportunity cost of sloth. This paper considers the generalization of this to the case of potentially avoidable punishment, depriving predictions for behaviour and the activity of dopaminergic and serotonergic neuromodulation.

    17. How can a Bayesian approach inform neuroscience? (pages 1169–1179)

      Jill X. O’Reilly, Saad Jbabdi and Timothy E. J. Behrens

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2012.08010.x

      Thumbnail image of graphical abstract

      In this review we consider how Bayesian logic can help neuroscientists to understand behaviour and brain function. Firstly, we review some key characteristics of Bayesian systems: they integrate information making rational use of uncertainty, they apply prior knowledge in the interpretation of new observations, and (for several reasons) they are very effective learners.

    18. You have full text access to this OnlineOpen article
      Uncertainty in action-value estimation affects both action choice and learning rate of the choice behaviors of rats (pages 1180–1189)

      Akihiro Funamizu, Makoto Ito, Kenji Doya, Ryohei Kanzaki and Hirokazu Takahashi

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2012.08025.x

      Thumbnail image of graphical abstract

      The estimation of reward outcomes for action candidates is essential for decision making. In this study, we examined whether and how the uncertainty in reward outcome estimation affects the action choice and the learning rate.

    19. Surprise! Neural correlates of Pearce–Hall and Rescorla–Wagner coexist within the brain (pages 1190–1200)

      Matthew R. Roesch, Guillem R. Esber, Jian Li, Nathaniel D. Daw and Geoffrey Schoenbaum

      Version of Record online: 4 APR 2012 | DOI: 10.1111/j.1460-9568.2011.07986.x

      Thumbnail image of graphical abstract

      Learning theory and computational accounts suggest that learning depends on errors in outcome prediction as well as changes in processing of or attention to events. These divergent ideas are captured by models, such as Rescorla–Wagner (RW) and temporal difference (TD) learning on the one hand, which emphasize errors as directly driving changes in associative strength, vs. models such as Pearce–Hall (PH) and more recent variants on the other hand, which propose that errors promote changes in associative strength by modulating attention and processing of events.