• Open Access

Contextual interaction between novelty and reward processing within the mesolimbic system

Authors

  • Nico Bunzeck,

    Corresponding author
    1. Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Martinistrasse 52, 20246 Hamburg, Germany
    • Department of Systems Neuroscience, University Medical Center Hamburg-Eppendorf, Martinistrasse 52, 20246 Hamburg, Germany
    Search for more papers by this author
  • Christian F. Doeller,

    1. Institute of Cognitive Neuroscience, University College London, 17 Queen Square, London, WC1N 3AR, United Kingdom
    2. Institute of Neurology, University College London, 12 Queen Square, London WC1N 3BG, United Kingdom
    3. Radboud University Nijmegen, Donders Institute for Brain, Cognition and Behaviour, Centre for Cognitive Neuroimaging, Nijmegen, The Netherlands
    Search for more papers by this author
  • Ray J. Dolan,

    1. Wellcome Trust Centre for Neuroimaging at UCL, 12 Queen Square, London WC1N 3BG, United Kingdom
    Search for more papers by this author
  • Emrah Duzel

    1. Institute of Cognitive Neuroscience, University College London, 17 Queen Square, London, WC1N 3AR, United Kingdom
    2. Institute of Cognitive Neurology, Otto von Guericke University, Leipziger Strasse 44, 39120 Magdeburg, Germany
    Search for more papers by this author

Abstract

Medial temporal lobe (MTL) dependent long-term memory for novel events is modulated by a circuitry that also responds to reward and includes the ventral striatum, dopaminergic midbrain, and medial orbitofrontal cortex (mOFC). This common neural network may reflect a functional link between novelty and reward whereby novelty motivates exploration in the search for rewards; a link also termed novelty “exploration bonus.” We used fMRI in a scene encoding paradigm to investigate the interaction between novelty and reward with a focus on neural signals akin to an exploration bonus. As expected, reward related long-term memory for the scenes (after 24 hours) strongly correlated with activity of MTL, ventral striatum, and substantia nigra/ventral tegmental area (SN/VTA). Furthermore, the hippocampus showed a main effect of novelty, the striatum showed a main effect of reward, and the mOFC signalled both novelty and reward. An interaction between novelty and reward akin to an exploration bonus was found in the hippocampus. These data suggest that MTL novelty signals are interpreted in terms of their reward-predicting properties in the mOFC, which biases striatal reward responses. The striatum together with the SN/VTA then regulates MTL-dependent long-term memory formation and contextual exploration bonus signals in the hippocampus. Hum Brain Mapp, 2011. © 2011 Wiley-Liss, Inc.

INTRODUCTION

Novelty is a motivationally salient learning signal that attracts attention, promotes memory encoding and modifies goal-directed behavior [Knight, 1996; Lisman and Grace, 2005; Mesulam, 1998; Sokolov, 1963]. Recent evidence from human and nonhuman primate studies raises the possibility that the motivational aspects of novelty partly relate to its shared properties with reward [Bunzeck and Duzel, 2006; Kakade and Dayan, 2002; Mesulam, 1998]. This suggestion follows from observations that in animal studies the substantia nigra/ventral tegmental area (SN/VTA) of the midbrain is activated by stimuli that predict rewards as well as stimuli that are novel [Ljungberg, et al. 1992]; for a review see [Lisman and Grace, 2005]. Similarly, the human SN/VTA is activated both by reward [Knutson and Cooper, 2005] and novelty [Bunzeck and Duzel, 2006; Bunzeck, et al. 2007; Wittmann, et al. 2005] as well as by cues predicting their occurrence [Knutson and Cooper, 2005; O'Doherty, et al. 2002; Wittmann, et al. 2005, 2007]. The neurotransmitter dopamine that is produced in the SN/VTA profoundly regulates motivational aspects of behavior [Berridge, 2007; Niv, et al. 2007].

Furthermore, there is converging evidence that the hippocampus, a medial temporal lobe (MTL) structure, which is critical for the formation of long-term episodic memories for novel events, is also implicated in various forms of reward learning [Devenport, et al. 1981; Holscher, et al. 2003; Ploghaus, et al. 2000; Purves, et al. 1995; Rolls and Xiang, 2005; Solomon, et al. 1986; Tabuchi, et al. 2000; Weiner, 2003; Wirth, et al. 2009]. For instance, the rodent hippocampus shows increased activity in baited but not unbaited maze arms [Holscher, et al. 2003]; in nonhuman primates it is involved in learning place reward associations [Rolls and Xiang, 2005]; hippocampal activity follows prediction error learning rules for aversive stimuli in humans [Ploghaus, et al. 2000]; and reward increases synchronization between hippocampus and nucleus accumbens neurons [Tabuchi, et al. 2000].

A commonality in the effects of reward and novelty can be reconciled theoretically by a suggestion that novelty acts to motivate exploration of an environment to harvest rewards [Kakade and Dayan, 2002]. According to this suggestion, a key motivational property of novelty is its potential to predict rewards, whereas familiar stimuli, if repeated in the absence of reward, gradually loose this potential. The exploration bonus hypothesis makes two types of predictions: a first one relates to the potency with which the status of being novel or familiar can predict reward and a second one relates to the contextually remote effects of this contingency on other stimuli. According to the first prediction, being a novel stimulus should be a more potent predictor of reward than being a familiar stimulus [e.g., Wittmann, et al. 2008]. That is, when novel stimuli predict reward, reward expectancy should be higher than when familiar stimuli predict rewards. The second (more indirect) prediction is that the motivationally enhancing effect of novelty on exploratory behavior should have a contextual effect on the motivational significance of other stimuli that are present in the same context. Compatible with this suggestion, Bunzeck and Duzel [ 2006] showed that in a context in which novel stimuli are present, familiar stimuli show less repetition suppression in MTL structures. This suggests that even in the absence of explicit reward, in a context in which novel stimuli are present, there is a stronger motivation to explore also the familiar stimuli in that context [Bunzeck and Duzel, 2006]. However, to date, these predictions about the relationship between novelty and reward have not been tested directly. In experimental terms, this requires manipulating the reward-predicting property of novelty such that rewards in a given context are predicted either by being novel or by being familiar. Here, we used this experimental approach to investigate the functional interaction between novelty and reward in an fMRI study.

Understanding the functional interaction between novelty and reward has profound implications for understanding how long-term plasticity for novel stimuli is regulated. A large body of physiological evidence shows that dopamine originating from the SN/VTA not only regulates motivational aspects of behavior but is critical for enhancing and stabilizing hippocampal plasticity [Frey and Morris, 1998; Li, et al. 2003] and hippocampus-dependent memory consolidation [O'Carroll, et al. 2006]. According to the so-called hippocampus-VTA loop model [Lisman and Grace, 2005] novelty signals are generated in the hippocampus and are conveyed to the SN/VTA through the nucleus accumbens and the ventral pallidum [Lisman and Grace, 2005]. Although the model emphasizes novelty itself as the key cognitive signal to modulate dopamine from the SN/VTA, it also explicitly raises the question how motivational factors regulate the impact of novelty on the activity of the hippocampus and the SN/VTA. The goal of this study is to approach this question from the vantage point of shared properties between novelty and reward and their functional interaction.

If novelty acts as a signal that motivates exploration to harvest rewards [Bunzeck and Duzel, 2006; Kakade and Dayan, 2002; Wittmann, et al. 2008] parts of the hippocampus-SN/VTA loop should only show a preferential response to novelty in a context where being novel predicts rewards but not in a context where being familiar predicts reward. At the same time, the enhancement of exploration when being novel is rewarded should boost hippocampal responses to familiar stimuli that are presented in the same context, even though these would not predict rewards. In contrast, in a context in which being familiar but not being novel predicts rewards, there should be less contextual motivation to explore and consequently hippocampal activity should be low for both the novel and the familiar stimuli in that context. Hence, the hypothesis that novelty has an intrinsic property to motivate explorative behavior in the search for rewards leads to the prediction of an interaction between the novelty- and reward status of stimuli. Accordingly, the hippocampus would respond strongly to both novel and familiar stimuli when being novel predicts reward and weakly to both novel and familiar stimuli when being familiar predicts reward.

The alternative possibility is that the novelty and reward-status of information is independent. According to this possibility, there should be no functional interaction between novelty and reward. In other words, parts of the hippocampus-SN/VTA loop would only express a main effect of novelty or reward but no interaction between both.

Taken together, manipulating the contingency between novelty and rewards can help to understand the key mechanisms that drive novelty responses within the mesolimbic system. To that end, we developed a paradigm where receiving monetary reward was contingent upon the novelty status of images of scenes [Bunzeck, et al. 2009]. Thus, making correct reward preference decisions (see methods) was only possible after correctly discriminating novel and familiar stimuli. Importantly, we assessed recognition memory one day after encoding and thus were able to identify to what extent components of the hippocampal-SN/VTA loop would correlate with the reward-related enhancement of long-term memory for novel and familiar stimuli.

MATERIALS AND METHODS

Two experiments were performed. While the first experiment (Experiment 1) was a behavioral experiment the second experiment (Experiment 2) involved behavioral measures and fMRI.

Subjects

In Experiment 1, 17 adults participated (13 female and four male; age range 19–33 years; mean 23.1, SD = 4.73 years) and 14 adults participated in Experiment 2 (five male and nine female; age range: 19–34 years; mean = 22.4 years; SD = 3.8 years). All subjects were healthy, right-handed and had normal or corrected-to-normal acuity. None of the participants reported a history of neurological, psychiatric, or medical disorders or any current medical problems. All experiments were run with each subject's written informed consent and according to the local ethics clearance (University College London, UK).

Experimental Design and Task

In both experiments, three sets of (1) a familiarization phase followed by (2) a recognition memory based preference judgment task were performed. Here, new images were used for each set resulting in 120 novel and 120 familiar images being used altogether. The experimental procedures were identical for both experiments except that Experiment 1 was performed on a computer screen and Experiment 2 was performed inside an MRI scanner. (3) On day two recognition memories for all presented images was tested using the “remember/know” procedure (see below).

(1) Familiarization: Subjects were initially familiarized with a set of 40 images (20 indoor and 20 outdoor images). Here, each picture was presented twice in random order for 1.5 s with an interstimulus interval (ISI) of 3 s and subjects indicated the indoor/outdoor status using their right hand index and middle finger. (2) Recognition memory test: subsequently, subjects performed a 9 minute recognition memory based preference judgment task (session). This part (session) was further subdivided into two blocks containing each 20 images from the familiarization phase (referred to as “familiar images”) and 20 previously not presented images (referred to as “novel images”; subjects could pause for 20 s between blocks). In any given block either novel images served as CS+ and familiar images as CS− or vice versa (Fig. 1). Participants were instructed to make a “preference” judgment to each image via a two-choice button press indicating “I prefer” or “I do not prefer” depending on the contingency between novelty status and reinforcement value. Importantly, the term “preferred” and “not-preferred” refers to the reward predicting status of the image (depending on the contextual contingency) rather than the aesthetic properties of the picture.

Figure 1.

Experimental design.

The contingency was randomized and indicated on the screen prior to each run by either “Novelty will be rewarded if preferred” (in which case novel images served as CS+ and familiar images as CS−) or “Familiarity will be rewarded if preferred” (here familiar images served as CS+ and novel images as CS−). Only correct “I prefer” responses following a CS+ led to a win of £0.50 whereas (incorrect) “I prefer” responses following CS− led to a loss of £-0.10. Both correct “I do not prefer” responses following CS− and (incorrect) “I do not prefer” responses following a CS+ led to neither win nor loss. Images were presented in random order for 1 s on a gray background followed by a white fixation cross for 2 s (ISI = 3 s). To ensure that neural reward responses were limited to the presented images (i.e., reward anticipation rather than outcome) no feedback was given on a trial by trial basis. Instead subjects were informed about their overall performance after each session (containing 2 blocks with each contingency). Prior to the experiment the subjects were instructed to respond as quickly and as correctly as possible and that only 20% of all earnings would be paid.

All images were gray-scaled and normalized to a mean gray-value of 127 and a standard deviation of 75. None of the scenes depicted human beings or parts of human beings including faces in the foreground.

Training Sessions

Each subject performed two training sessions prior to the experiment. Similar to the actual experiment both training phases began with a familiarization phase, during which only 10 images were presented twice in random order (duration = 1.5 s; ISI = 3 s) and subjects indicated their indoor/outdoor status. As was the case for the main experiment, familiarization was followed by a memory based preference judgment task including familiar and novel images. For training purposes, in training session 1 a feedback was given on a trial-by-trial basis after each response. In training session 2 reward feedback was not shown immediately after each stimulus/response. Following each training session, the subject's financial reward (maximum £1) was reported to the subject. In Experiment 2, subjects also received a brief training session containing 10 familiar and 10 novel images per response contingencies block.

One day later, subjects performed an incidental recognition memory test following the “remember/know” procedure [Tulving, 1985]. Here, in random order all 240 previously seen pictures (60 per condition) were presented together with 60 new distractor pictures on the center of a computer screen. Task: The subject first made an “old/new” decision to each individually presented picture using their right index or middle finger. Following a “new” decision, subjects were prompted to indicate whether they were confident (“certainly new”) or unsure (“guess”), again using their right index and middle finger. After an “old” decision, subjects were prompted to indicate if they were able to remember something specific about seeing the scene at study (“remember response”), just felt familiarity with the picture without any recollective experience (“familiar” response) or were merely guessing that the picture was an old one (“guess” response). The subject had 4 s to make each of both judgments and there was a break of 15 s after every 75 pictures.

fMRI Methods

We performed fMRI on a 3-Tesla Siemens Allegra magnetic resonance scanner (Siemens, Erlangen, Germany) with echo planar imaging (EPI) using a quadrature transceiver coil with a design based on the “birdcage” principle. In the functional session 48 T2*-weighted images (EPI-sequence; covering the whole head) per volume with blood oxygenation level-dependent (BOLD) contrast were obtained (matrix size: 64 × 64; 48 oblique axial slices per volume angled at −30° in the antero-posterior axis; spatial resolution: 3 × 3 × 3 mm; TR = 3120 ms; TE = 30 ms; z-shimming pre-pulse gradient moment of PP = 0 mT/m*ms; positive phase-encoding polarity). The fMRI acquisition protocol was optimized to reduce susceptibility-induced BOLD sensitivity losses in inferior frontal regions and temporal lobe regions [Deichmann, et al. 2003; Weiskopf, et al. 2006]. For each subject functional data were acquired in three scanning sessions containing 180 volumes per session. Six additional volumes per session were acquired at the beginning of each series to allow for steady state magnetization and were subsequently discarded from further analysis. Anatomical images of each subject's brain were collected using multi-echo 3D FLASH for mapping proton density, T1 and magnetization transfer (MT) at 1 mm resolution [Helms, et al. 2009; Weiskopf and Helms, 2008] and by T1 weighted inversion recovery prepared EPI (IR-EPI) sequences (matrix size: 64 × 64; 64 slices; spatial resolution: 3 × 3 × 3 mm). Additionally, individual field maps were recorded using a double echo FLASH sequence (matrix size = 64 × 64; 64 slices; spatial resolution = 3 × 3 × 3 mm; gap = 1 mm; short TE = 10 ms; long TE = 12.46 ms; TR = 1020 ms) for distortion correction of the acquired EPI images [Weiskopf, et al. 2006]. Using the “FieldMap toolbox” [Hutton, et al. 2002, 2004] field maps were estimated from the phase difference between the images acquired at the short and long TE.

The fMRI data were preprocessed and statistically analyzed using the SPM5 software package (Wellcome Trust Centre for Neuroimaging, University College London, UK) and MATLAB 7 (The MathWorks, Inc., Natick, MA). All functional images were corrected for motion artifacts by realignment to the first volume; corrected for distortions based on the field map [Hutton, et al. 2002]; corrected for the interaction of motion and distortion using the “Unwarp toolbox” [Andersson, et al. 2001; Hutton, et al. 2004]; spatially normalized to a standard T1-weighted SPM-template [Ashburner and Friston, 1999] (care was taken that in particular midbrain regions aligned with the standard-template); re-sampled to 2 × 2 × 2 mm; and smoothed with an isotropic 4 mm full-width half-maximum Gaussian kernel. Such fine-scale spatial resolution in combination with a relatively small smoothing kernel is the basis for being able to detect small clusters of activation, for instance within the midbrain and MTL regions where differential activation patterns (i.e., novelty responses and interactions between novelty and reward) might be located in close proximity [Bunzeck, et al. 2010]. The fMRI time series data were high-pass filtered (cutoff = 128 s) and whitened using an AR(1)-model. For each subject an event-related statistical model was computed by creating a “stick function” for each event onset (duration = 0 s), which was convolved with the canonical hemodynamic response function combined with time and dispersion derivatives [Friston, et al. 1998]. Modeled conditions included novel-rewarded, novel-not-rewarded, familiar-rewarded, familiar-not-rewarded and incorrect responses. To capture residual movement-related artifacts six covariates were included (the three rigid-body translation and three rotations resulting from realignment) as regressors of no interest. Regionally specific condition effects were tested by employing linear contrasts for each subject and each condition (first-level analysis). The resulting contrast images were entered into a second-level random-effects analysis. Here, the hemodynamic effects of each condition were assessed using a 2 × 2 analyses of variance (ANOVA) with the factors “reward” (rewarding, not rewarding) and “novelty” (novel, familiar). This model allowed us to test for main effects of novelty, main effects of reward and the interaction between both. All contrasts were thresholded at P = 0.001 (uncorrected) except the regression analyses (P = 0.005, uncorrected). Both relatively liberal thresholds were chosen based on our precise a priori anatomical hypotheses within the mesolimbic system.

The anatomical localization of significant activations was assessed with reference to the standard stereotaxic atlas by superimposition of the SPM maps on one of two group templates. A T1-weighted and a MT-weighted group template were derived from averaging all subjects' normalized T1 or MT images (spatial resolution of 1 × 1 × 1 mm). While the T1-template allows anatomical localization outside the midbrain on MT-images the SN/VTA region can be distinguished from surrounding structures as a bright stripe while the adjacent red nucleus and cerebral peduncle appear dark [Bunzeck and Duzel, 2006; Bunzeck, et al. 2007; Eckert, et al. 2004].

Note that we prefer to use the term SN/VTA and consider BOLD activity from the entire SN/VTA complex for several reasons [Duzel, et al. 2009]. Unlike early formulations of the VTA as an anatomical entity, different dopaminergic projection pathways are dispersed and overlapping within the SN/VTA complex. In particular, dopamine neurons that project to the limbic regions and regulate reward-motivated behavior are not confined to the VTA but they are distributed also across the SN (pars compacta) [Gasbarri, et al. 1994, 1997; Ikemoto, 2007; Smith and Kieval, 2000]. Functionally, this is paralleled in the fact that in humans and primates DA neuron within the SN and VTA respond to both reward and novelty [see for instance Ljungberg, et al., 1992 or Tobler, et al., 2003 for a depiction of recording sites].

RESULTS

All analyses (behavioral and fMRI) are based on trials with correct preference responses.

Experiment 1

Subjects discriminated between conditions in both contexts with high accuracy (Table I) and there were no statistically significant differences between conditions. Reaction time (Fig. 2A) analysis revealed that subjects responded fastest to familiar reward predicting stimuli (all P's < 0.007), but there was no difference between the other three conditions (novel-rewarded, novel-not-rewarded, familiar-not-rewarded; all P's > 0.05).

Table I. Behavioral results
 Experiment I Experiment II
Rewarding–hitsNot rewarding–CorRejRewarding–hitsNot rewarding–CorRej
  1. Table shows the hit-rate or correct rejection rate per condition (second line per cell) for Experiment I and Experiment II. Numbers in brackets indicate one standard deviation of the mean.

Novel0.88 (0.08)0.91 (0.1)Novel0.81 (0.09)0.84 (0.1)
Familiar0.9 (0.09)0.87 (0.09)Familiar0.86 (0.06)0.85 (0.09)
Figure 2.

Behavioral results. (A) Reaction-times. In both experiments RTs were significantly faster for familiar rewarded images compared to all other conditions (all P < 0.01)—as indicated by the asterisk—but there was no other difference between conditions. (B) Recognition memory performance in Experiments 1 and 2. The bars show overall recognition memory scores (corrected hit-rate = correct remember plus correct know responses) in the memory test on the next day. Error-bars denote one standard error of the mean and asterisk indicates a statistically significant difference (P < 0.05).

Recognition memory performance–second day. Recognition memory analysis was based on both hits (remember responses, know responses following pictures previously seen during encoding), and false alarms ([FA]: remember, know to distractors). In a first step, we calculated the proportion of remember- and know-responses for old and new images (i.e., hit-rates and FA-rates) by dividing the number of hits (and FA, respectively) by the number of items per condition. Secondly, corrected hit-rates were obtained for remember-responses ([Rcorr], remember hit-rate minus remember FA-rate) and know-responses ([Kcorr], know hit-rate minus know FA-rate) (see Table II). In a planned comparison, we assessed the effect of reward on overall recognition memory (corrected hit-rate = Rcorr + Kcorr) for novel and familiar images. This revealed that reward significantly improved overall memory for novel images compared to novel not rewarded images (P = 0.036) but there was no such improvement of overall memory by reward for familiar images (P > 0.5; Fig. 2). Furthermore, the enhancing effect of reward on recognition memory for novel images was equally strong for recollection and familiarity as revealed by analysis of variance (ANOVA; no interaction between reward and recognition memory type [F(1,16) = 2.28, P > 0.15)].

Table II. Recognition memory
Novel rewardedNovel not-rewardedFamiliar rewardedFamiliar not-rewarded
Recollection (Rcorr)Familiarity (Fcorr)Recollection (Rcorr)Familiarity (Fcorr)Recollection (Rcorr)Familiarity (Fcorr)Recollection (Rcorr)Familiarity (Fcorr)
  1. Table shows corrected recollection rate (Rcorr) and corrected familiarity rate (Fcorr) for all conditions and both experiments. Numbers in brackets indicate one standard error of the mean.

Experiment I
0.092 (0.016)0.131 (0.019)0.071 (0.019)0.116 (0.023)0.588 (0.044)0.127 (0.041)0.567 (0.039)0.159 (0.041)
Experiment II
0.070 (0.021)0.088 0.016)0.063 (0.019)0.067 (0.018)0.396 (0.049)0.167 (0.036)0.322 (0.039)0.183 (0.031)

Experiment 2

As in Experiment 1, subjects discriminated between conditions in both contexts with high accuracy and no significant differences between conditions (Table I). As in Experiment 1, reaction-time (Fig. 2A) analysis showed responses were significantly faster for familiar reward predicting stimuli (all P's < 0.001) but there was no difference between the other three conditions (novel-rewarded, novel-not-rewarded, familiar-not-rewarded; all P's > 0.05).

Recognition memory performance–second day. In contrast to Experiment 1, recognition memory for novel rewarded images was not significantly improved compared to novel unrewarded images (neither overall recognition memory nor Rcorr/Kcorr; P > 0.05, Table II). Also in contrast to Experiment 1, in Experiment 2 recollection for familiar rewarded images was significantly enhanced compared to familiar not-rewarded images (P = 0.001, Table II) which resulted in enhanced overall memory (Rcorr + Kcorr) for familiar rewarded compared to familiar not-rewarded images (there was no significant difference between the corrected know-rates of familiar rewarded and familiar not-rewarded images, P > 0.05). Furthermore, data in Table II and Figure 2B shows that overall memory performance was considerably lower in Experiment 2 compared to Experiment 1, which was supported by a mixed effects ANOVA.

fMRI results−reward based recognition memory test. First, we analyzed fMRI data using a 2 × 2 ANOVA with factors “novelty” (novel, familiar) and “reward” (reward, no reward). We found a main effect of novelty in bilateral medial orbitofrontal cortex (mOFC) and the right MTL including the hippocampus and rhinal cortex, (Fig. 3; see Supporting Information Table S1 for a complete list of activated brain structures). A main effect of reward was observed within the bilateral caudate, septum/fornix, ventral striatum (ncl. accumbens), bilateral mOFC and medial prefrontal cortex (mPFC) (Fig. 4; Supporting Information Table S1). These two main effects were exclusively masked with the effects of interactions (exclusive masking, P = 0.05, uncorrected) to identify only those regions that expressed main effects in the absence of any interaction.

Figure 3.

fMRI results Experiment 2. A main effect of novelty was observed within the right hippocampus (A), rhinal cortex (B) and medial OFC (C). Activation maps were superimposed on a T1-weighted group template (see methods), coordinates are given in MNI space and color bar indicates T-values (results thresholded at P = 0.001, uncorrected). Error-bars denote one standard error of the mean and asterisk indicates a statistically significant difference (P < 0.05). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Figure 4.

fMRI results Experiment 2. A main effect of reward was observed within the striatum, including ncl. accumbens (A) and caudate ncl. (C), septum/fornix (B), medial PFC (C), and medial OFC (D). Activation maps were superimposed on a T1-weighted group template (see methods), coordinates are given in MNI space and color bar indicates T-values (results thresholded at P = 0.001, uncorrected). Error-bars denote one standard error of the mean and asterisk indicates a statistically significant difference (P < 0.05). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

To test our two predictions regarding the exploration bonus hypothesis, we performed two additional analyses. First, within brain regions that showed a main effect of reward we analyzed, which areas also showed a stronger response for novel rewarded than familiar rewarded stimuli (i.e., conjunction). This analysis did not yield any significant results suggesting that there were no brain regions where being novel lead to a stronger reward prediction response than being familiar. Secondly, we assessed the interaction (F-contrast) between novelty and reward. Such an interaction was expressed within several brain regions including right hippocampus, inferior frontal gyrus and right OFC (Supporting Information Table S1, Fig. 5). Specifically, the hippocampus showed the expected interaction pattern with higher responses for stimuli presented in the context where being novel is rewarded (T-contrast). That is, hippocampal activity was higher for novel rewarded stimuli and familiar unrewarded stimuli (note that both of these stimuli were presented in the same context) than for novel unrewarded and familiar rewarded stimuli (again, note that both of these stimuli were presented in the same context). Planned post hoc comparison confirmed statistically significant differences between novel-rewarded vs. novel not-rewarded (P < 0.025) and familiar rewarded vs. familiar not-rewarded (P < 0.01; Fig. 5).

Figure 5.

fMRI results Experiment 2. An interaction between novelty and reward was observed within the hippocampus and OFC. Within the hippocampus responses to familiar not-rewarded items was enhanced compared to familiar-rewarded items if presented in context with novel-rewarding items. Activation maps were superimposed on a T1-weighted group template (see methods), coordinates are given in MNI space and color bar indicates F-values (results thresholded at P = 0.001, uncorrected). Error-bars denote one standard error of the mean and asterisk indicates a statistically significant difference (P < 0.05). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

It should be noted that the activation pattern for the interaction between novelty and reward (36, −14, −16; Fig. 5) is adjacent but not identical to the activation of a main effect of novelty, which is also located within the right hippocampus (28, −14, −20; Fig. 3). Such differential activation pattern accords to our hypotheses, cell recordings in animals and human fMRI studies. For instance, animal research has shown that different hippocampal neurons can respond to different features (such as novelty or familiarity) within the same task [Brown and Xiang, 1998]. In line with these observations, we have shown in humans that spatially distinct hippocampal activations can reflect differential properties of novelty processing, absolute novelty signals, adaptively scaled novelty signals and novelty prediction errors, ([Bunzeck, et al. 2010], Supporting Information Fig. S4). Johnson et al. (2008) reported that spatially very close clusters of activation showed very different responses to novelty: one cluster showed a categorical difference between new items and old items whereas the other cluster showed a linear response decrement as a function of increased stimulus familiarity. However, to further exclude the possibility of a false positive result we applied small volume correction to both activation patterns using the right anterior hippocampus as volume. The analysis reached statistical significance (P ≤ 0.05; FWE-corrected).

Finally, we sought to link reward related memory improvement to regional brain activity patterns using regression analyses (all analyses were performed with data from Experiment 2). First, the contrast novel rewarded vs. novel not-rewarded images was entered into a second-level simple regression analysis using individual memory improvement by reward as regressor (Δ corrected hit-rate = corrected hit rate [Rcorr + Fcorr] for novel-rewarded – corrected hit-rate for novel not-rewarded). This analysis was motivated by our initial observation of improved overall memory (i.e., recollection and familiarity) for novel images by reward (Experiment 1) and previous similar findings [Adcock, et al. 2006; Krebs, et al. 2009; Wittmann, et al. 2005]. This revealed a significant positive correlation between hemodynamic responses (HR) and recognition memory improvement within the SN/VTA, right anterior MTL (junction of rhinal cortex hippocampus/amygdala) and right ventral striatum (Fig. 6, Supporting Information Table S1 for all activated regions). In a second regression analysis, the same contrast for familiar images (familiar rewards vs. familiar not-reward) was correlated with individual improved recollection rate (behaviorally, recollection rate was significantly enhanced for familiar rewarded compared to not-rewarded images but there was no improvement in Fcorr). Since RTs for familiar rewarded images were significantly faster than for familiar not-rewarded images the difference between both for each subject was also entered as regressor. Here, we were only interested in those regions that showed a significant positive correlation between HR differences (familiar rewarded vs. familiar not rewarded) and increased recollection rate (familiar rewarded vs. familiar not-rewarded) but not those that also showed any correlation with RT improvement. This analysis revealed similar effects to the first regression analysis, namely, a significant correlation between HR and reward-related recollection-rate improvement within the ventral striatum (left), right hippocampus and left rhinal cortex (Fig. 7, Supporting Information Table S1), but no correlation within the SN/VTA. A statistically more sensitive post hoc analysis of the SN/VTA voxel [4, −18, −16] that showed a significant correlation for novel images also revealed no correlation between hemodynamic responses and improved recollection rate for familiar images (r = −0.07, P = 0.811).

Figure 6.

fMRI results Experiment 2–regression analysis. A significant correlation between recognition memory improvement for novel rewarded compared to not-rewarded images (Δ corrected hit-rate) and hemodynamic response differences between novel rewarded and novel not-rewarded images (parameter estimates, beta) was exhibited in bilateral medial SN/VTA (A), right MTL (B), and ventral striatum (C). Activation maps were superimposed on a MT (A) and T1-weighted (B, C) group template (see methods), coordinates are given in MNI space and color bar indicates T-values (results thresholded at P = 0.005, uncorrected). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Figure 7.

fMRI results Experiment 2–regression analysis. A significant correlation between recollection rate improvement for familiar rewarded compared to familiar not-rewarded images (Δ recollection rate) and hemodynamic response differences between familiar rewarded and familiar not-rewarded images (parameter estimates, beta) was observed in MTL including right hippocampus (A) and left rhinal cortex (B), and left ventral striatum (C). Activation maps were superimposed on a T1-weighted group template (see methods), coordinates are given in MNI space and color bar indicates T-values (results thresholded at P = 0.005, uncorrected). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

DISCUSSION

Our finding that a cluster of voxels within the MTL (including hippocampus and rhinal cortex) showed a main effect of novelty but not a main effect of reward (Fig. 3A,B), supports the idea that the hippocampus and rhinal cortex can signal novelty independent of reward-value. This finding accords with a wide range of animal and human studies suggesting that both the hippocampus and rhinal cortex are sensitive to novelty [Brown and Xiang, 1998; Dolan and Fletcher, 1997; Knight, 1996; Lisman and Grace, 2005; Strange, et al. 1999; Yamaguchi, et al. 2004]. However, another region within the hippocampus also showed the hypothesized interaction of novelty and reward (Fig. 5) with significantly enhanced hemodynamic responses to familiar unrewarded images if presented in a context where being novel was rewarded.

This interaction of novelty and reward in the hippocampus provides evidence for our second prediction of a contextual effect in accordance with the exploration bonus framework (see [Sutton and Barto, 1981] for a formal description of the exploration bonus within the exploration-exploitation dilemma). Based on the notion that novelty can act as an exploration bonus for reward [Kakade and Dayan, 2002] we predicted that in a context in which being novel is rewarded there should be enhanced exploration also of the familiar stimuli (even when they are unrewarded). Compatible with this possibility, familiar stimuli elicited stronger hippocampal activity in a context where the availability of reward was signaled by being novel as compared to a context where reward is signaled by being familiar. This contextually enhanced neural activation within the hippocampus during encoding, however, did not directly translate into long-term memory, that is, better memory for familiar items when presented in context with novel reward predicting items. Instead, recognition performance was driven by the reward predicting status of an item both for novel (Experiment 1) and familiar (Experiment 2) stimuli (see below). This suggests that, in an experimental setting in which reward prediction and contextual novelty may both influence learning, reward prediction can exert the dominance influence.

Another prediction regarding the exploration bonus framework was not confirmed. We did not find any brain regions which exhibited a main effect of reward and at the same time a significantly stronger activity for novel rewarded than familiar rewarded images. At the first glance, this negative finding seems to be at odds with previous studies [Krebs, et al. 2009; Wittmann, et al. 2008]. However, in both, the Krebs et al. [ 2009] and the Wittmann et al. [ 2008] study, enhanced reward prediction for novel stimuli was found under conditions where the novelty status of stimuli was implicit and participants attended to reward contingencies. In fact, Krebs et al. reported that this enhancement was absent when participants attended to the novelty status of stimuli rather than attending to reward contingencies (note however, that in Krebs et al. novelty status per se was not predictive of reward). Hence, unlike the contextual interaction between novelty and reward (Fig. 5), this aspect of the exploration bonus may be strongly task-dependent occurring only when subjects can attend to reward contingencies without having to assess novelty. It has been suggested on the basis of rodent studies that prefrontal and hippocampal inputs compete with each other for control over the nucleus accumbens (a part of the ventral striatum) [Goto and Grace, 2008]. It is plausible that task-related attention to novelty or reward would affect such a competition.

Recognition memory scores from Experiment 1 (Fig. 2) were well compatible with the exploration bonus framework in showing a reward-related behavioral enhancement of long-term memory performance for novel but not for familiar stimuli. However, the behavioral results obtained under conditions where encoding occurred in the fMRI scanner (Experiment 2) were different in that memory for familiar stimuli did show an enhancement by reward (for novel stimuli this enhancement did not reach significance). One reason for this discrepancy may be that in Experiment 1, the encoding context and the retrieval context on the next day were identical (subjects learned and were tested in the same room) whereas for Experiment 2 they were different (subjects encoded in the fMRI and were tested in a testing room). It is well-known that changes between encoding and retrieval context can have profound influences on memory performance [Godden and Baddeley, 1975]. Compatible with this possibility, memory performance was considerably lower in Experiment 2 than in Experiment 1 (Fig. 2). Such context effects may have also led to the discrepancy in the behavioral patterns observed in Experiments 1 and 2.

The ventral striatum (Fig. 4A) and medial prefrontal cortex (Fig. 4 C,D) expressed main effects of expected reward value. In our task reward-prediction depended upon explicit novelty discrimination and thus it is apparent that regions expressing expected reward value (ventral striatum, septum/fornix) require access to information about memory for the presented picture. A likely origin of such declarative memory information is the MTL. In fact, hippocampus and rhinal cortex, as part of the MTL, not only expressed the main effect of novelty, but they are also well-known to send efferents to the ventral striatum and the medial prefrontal cortex (note that projection from rhinal cortex to the NAcc stem primarily from the entorhinal cortex [Friedman, et al. 2002; Selden, et al. 1998; Thierry, et al. 2000]). The precise mechanisms and computational processes, however, which may be implicated in translating novelty into reward responses, are unclear. This possibly involves the medial prefrontal cortex (including orbital parts) which–in line with previous studies [O'Doherty, et al. 2004; Ranganath and Rainer, 2003]–expressed both novelty and reward related activation (Fig. 3C and 4C,D).

The functional implications of our results regarding the representation of novelty and reward responses in the hippocampus, SN/VTA, ventral striatum and medial PFC are summarized in Figure 8. To provide support for this model, we calculated a correlation between the activation of our regions of interest, using a Spearman correlation analysis for each subject on the deconvolved time series, to provide a group correlation coefficient R and a P-value.

Figure 8.

Schematic illustration of the functional relationship between hippocampus, Nucleus accumbens (NAcc), medial prefrontal cortex (mPFC) and substantia nigra/ventral tegmental area (SN/VTA). To provide support for this model, we calculated a correlation between the activation of our regions of interest, using a Spearman correlation analysis for each subject on the deconvolved time series which results in a group correlation coefficient R and a P-value. It should be noted that the arrows indicate assumed directionality on the basis of known projections rather than quantitatively estimated causality. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Since reward was contingent upon novelty and the sole region that represented both types of signals was the mPFC, this region is likely to be the source of novelty-based reward signaling (R = 0.09; P < 0.001). The hippocampus, on the other hand, is most likely the source of the novelty signal for the mPFC (R = 0.11; P < 0.001). This is plausible given that there are direct projections from the hippocampus to mPFC [Ferino, et al. 1987; Rosene and Van Hoesen, 1977]. It is also plausible that the mPFC reward signal is then conveyed to the NAcc (R = 0.09; P < 0.001) and the SN/VTA (R = 0.03; P = 0.08). It should be noted that the SN/VTA signal only correlated with the novelty responsive mPFC (R = 0.03; P = 0.08) but not the reward responsive mPFC (R = 0.007; P > 0.6). This suggests that mOFC inputs to the SN/VTA might arise more strongly from those mPFC regions associated with novelty processing rather than reward processing. Our observation that the mPFC responds to novelty and correlates with the SN/VTA signal is also compatible with the suggestion [Lisman and Grace, 2005] that the PFC is a source of a novelty signal into dopaminergic circuitry. The role of the NAcc in novelty signaling, however, still remains unclear [Duzel, et al. 2009]. That is, although we did not observe novelty signals within the NAcc there was a strong correlation between signals in the NAcc and novelty responsive mOFC regions (R = 0.09; P < 0.001), NAcc and novelty responsive hippocampus regions (R = 0.15; P < 0.001), and the NAcc and SN/VTA (R = 0.19; P < 0.001). Finally, it should be noted that the arrows in our model indicate assumed directionality on the basis of known projections rather than quantitatively estimated causality.

Reward related improvement of recognition memory was correlated with ventral striatum, SN/VTA and MTL activation (Fig. 6). An important aspect of hippocampal learning and plasticity is a requirement for DA in the expression of the late phase LTP (long-term potentiation) but not early phase LTP [Frey and Morris, 1998; Frey, et al. 1990; Huang and Kandel 1995; Jay 2003; Morris 2006]. This supports a view that DA is required for long-term memory consolidation, which is supported by recent behavioral data in rodents [O'Carroll, et al. 2006]. Our data are compatible with this view in showing a correlation between long-term memory improvement through reward one day after encoding and activation within putative dopaminergic regions and hippocampus. In particular, we see a correlation for novel rewarded vs. not-rewarded items within SN/VTA, ventral striatum and hippocampus and a correlation for familiar rewarded vs. nonrewarded items within ventral striatum and hippocampus. Given that the ventral striatum is a primary output structure of the dopaminergic midbrain (SN/VTA) [Fields, et al. 2007] our results suggests that an ability to observe a reward-related enhancement of long-term memory through the hippocampal-SN/VTA is not limited to novel stimuli but also applies to familiar stimuli. In fact, it is likely that the degree of familiarity among the class of familiar stimuli (during encoding) was quite variable and that those stimuli whose encoding benefited most from reward were the least familiar (relatively most novel) ones. Therefore it is reasonable to assume that correlations for the novel and familiar stimulus classes were driven by the same mechanisms.

We also observed a main effect of reward in the septum/fornix (Fig. 4B), a region that is likely to harbor cholinergic neurons which project to medial temporal structures. Interestingly, animal studies show that similar to DA neurons, cholinergic neurons (in the basal forebrain) respond to novelty and habituate when stimuli become familiar [Wilson and Rolls, 1990b]. However, in tasks in which familiar stimuli predict reward, the activity of basal forebrain neurons reflect reward-prediction rather than novelty status [Wilson and Rolls, 1990a]. Our findings (Fig. 4B) are compatible with the observation of Wilson and Rolls (1990a) although we cannot say to what extent these activations actually involve responses of cholinergic neurons.

Taken together, we replicate recent observations that activity of the ventral striatum, SN/VTA, hippocampus and rhinal cortex correlated with reward-related memory enhancement compatible with the hippocampus-SN/VTA loop. Importantly, our findings provide new key insights into the functional properties of the components of this loop. In a task in which the novelty status of an item predicted reward the hippocampus preferentially expressed the novelty status whereas ventral striatum activity reflected the reward value independently of novelty status. The medial PFC (including orbital parts) was likely to be the site where novelty and reward signals were integrated because it expressed both novelty and reward effects and is known to be connected with the hippocampus and ventral striatum. Finally, in line with the exploration bonus theory [Kakade and Dayan, 2002] novel reward predicting stimuli exerted contextually enhancing effects on familiar (not rewarding) items, which were expressed as enhanced neural responses within the hippocampus.

Acknowledgements

We would like to thank K. Herriot for support in data acquisition.

Ancillary