SEARCH

SEARCH BY CITATION

Keywords:

  • novelty;
  • memory;
  • reward;
  • mesolimbic system;
  • prediction error

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES
  8. Supporting Information

Declarative memory is remarkably adaptive in the way it maintains sensitivity to relative novelty in both unknown and highly familiar environments. However, the neural mechanisms underlying this contextual adaptation are poorly understood. On the basis of emerging links between novelty processing and reinforcement learning mechanisms, we hypothesized that responses to novelty will be adaptively scaled according to expected contextual probabilities of new and familiar events, in the same way that responses to prediction errors for rewards are scaled according to their expected range. Using functional magnetic resonance imaging in humans, we show that the influence of novelty and reward on memory formation in an incidental memory task is adaptively scaled and furthermore that the BOLD signal in orbital prefrontal and medial temporal cortices exhibits concomitant scaled adaptive coding. These findings demonstrate a new mechanism for adjusting gain and sensitivity in declarative memory in accordance with contextual probabilities and expectancies of future events. Hum Brain Mapp, 2010. © 2010 Wiley-Liss, Inc.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES
  8. Supporting Information

Novelty is a fundamental signal associated with attracting attention, promoting memory encoding and modifying goal-directed behavior [Knight,1996; Lisman and Grace,2005; Mesulam,1998; Sokolov,1963]. The influential “novelty encoding hypothesis” [Tulving et al.,1996] suggests that there is a direct relationship between the probability of long-term encoding of information and its degree of novelty. How the novelty of information is quantified in terms of a neural encoding signal and to what extent this is separable from the neural representation of familiarity are not fully understood.

There are at least three different sets of results on the coding and effects of novelty. First, neural responses to stimuli are widely observed to decrease monotonically with the number of previous exposures, as in repetition suppression [Desimone,1996; Miller and Desimone,1994]. This provides an “absolute” signal of novelty. Second, during tasks involving old/new discrimination, there are “binary” neural responses that do not discriminate different levels of familiarity, but simply old from new [Johnson et al.,2008]. Furthermore, binary and absolute signals lie in close anatomical proximity in the hippocampus. Finally, it is very well established that neural novelty responses are modified by the likelihood of expected occurrences of new events in a given context [e.g., Chong et al.,2008; Duzel and Heinze,2002; Knight,1996; Lisman and Grace,2005; Schott et al.,2005; Yamaguchi et al.,2004; for a review, see Lisman and Grace [2005]]. However, the nature of these modifications is incompletely understood. In particular, it is unclear whether any of these signals modulate how much is encoded about stimuli in different environments and how such a modulation could be achieved (see also Brown and Xiang [1998] and Yonelinas et al. [2005]).

Here, we address the critical questions as to how the magnitude of neural novelty responses and the behavioral impact on incidental memory is scaled in contexts with different expected probabilities of new and familiar events and the extent to which this scaling coexists with an absolute representation of the familiarity status of stimuli. We hypothesized that, as for prediction error signals for reward [Tobler et al.,2005], novelty signals would be scaled according to contextual probabilities and also according to predictions regarding the relative novelty and familiarity of items that need to be explicitly discriminated in memory. This hypothesis, which sits comfortably with the theoretical suggestion that novelty acts to motivate exploration of the environment to harvest rewards [Kakade and Dayan,2002], arises out of converging evidence from human and nonhuman primate studies that point to shared properties of novelty and reward.

In anatomical terms, the prefrontal cortex, hippocampus, and substantia nigra/ventral tegmental area (SN/VTA) have all been implicated in both novelty processing and certain forms of reinforcement learning [Duzel et al.,2009; Lisman and Grace,2005; Ljungberg et al.,1992; Montague et al.,2004; Schultz,1998,2006]. For example, the human SN/VTA is activated by actual reward [Knutson and Cooper,2005] and novelty [Bunzeck and Duzel,2006; Bunzeck et al.,2007; Wittmann et al.,2005], as well as by cues predicting their occurrence [Knutson and Cooper,2005; O'Doherty et al.,2002; Wittmann et al.,2005,2007]. A recent network model of dopaminergic modulation of hippocampal plasticity for novel events, termed the hippocampal-VTA loop model [Lisman and Grace,2005], proposes that novelty signals are conveyed from the hippocampus to the SN/VTA through mesolimbic circuitry including the ventral striatum and ventral pallidum. This novelty signal is integrated with information from prefrontal cortex to elicit SN/VTA activation in response to novelty, with ensuing release of dopamine (DA) in the hippocampus.

By contrast with novelty responses, much is known about contextual adaptation of neural responses associated with reward magnitude. One, by now conventional, form of adaptation is seen in the firing rates of midbrain DA neurons, which encode a temporally sophisticated prediction error for the delivery of reward [Montague et al.,2004; Schultz,1998]. Thus, in a context in which there are two equally probable rewarding outcomes, delivery of the lower reward will cause below-baseline activity, whereas delivery of the higher reward leads to an increase in firing. The second form of adaptation (which we call adaptive scaling) is less conventional, with increases and decreases in neural firing above and below baseline remaining constant across a range of actual differences between the quantities of predicted and actual reward [Tobler et al.,2005]. Adaptive scaling involves the SN/VTA DA system adjusting its gain and sensitivity in accordance with contextual expectations.

On the basis of the commonalities between the functional architectures of reward and novelty, we conjectured that the same two forms of neural adaptation seen for reward may also characterize novelty. As for rewards, adaptive scaling of novelty should occur under explicit task instructions, that is when novelty assessment is required and the novelty status of stimuli is not incidental to the current task goal (e.g., see Bunzeck and Duzel [2006]). While adaptive scaling of rewards has been suggested as allowing efficient coding of appetitive prediction errors [Barlow,1961], we suggest that adaptive scaling of novelty arranges for efficient encoding into long-term memory of information, perhaps via the influence on long-term plasticity of DA release in the hippocampus [Frey and Morris,1998; Morris,2006; O'Carroll et al.,2006]. Adaptive scaling of novelty would have the critical advantage over absolute coding of preventing memory capacity from being overwhelmed in novel environments (by comparatively reducing gain and/or sensitivity), while allowing efficient memory updating in familiar environments (by enhancing gain and sensitivity). As such, adaptive scaling is a generalization and modification of the “novelty encoding hypothesis” [Tulving et al.,1996] according to which new items are preferentially encoded over familiar items [Tulving and Kroll,1995]. To put it another way, rather than being an algorithmic question of mere efficient coding of a signal to optimize the use of a limited bandwidth channel, adaptively scaled novelty responses could be part of a computationally sophisticated process ensuring efficient encoding of stimuli into memory.

To test the hypothesis that novelty and reward both express contextual adaptive scaling, we conducted two fMRI studies in healthy young adults using an adaptive reward coding paradigm adapted from a study in nonhuman primates [Tobler et al.,2005]. The first fMRI study was designed to replicate adaptive scaling of reward in humans; the second was designed to assess whether novelty is also scaled adaptively under the same experimental conditions. We reasoned that adaptive scaling of novelty and reward should share components of the hippocampal-SN/VTA loop and extracted these common anatomical elements through a direct comparison of both fMRI studies. Finally, we assessed whether under conditions in which reward magnitude is scaled adaptively, long-term memory for novel events is likewise modulated adaptively. This finding would directly link adaptive scaling to long-term plasticity.

MATERIALS AND METHODS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES
  8. Supporting Information

None of the tested participants reported a history of neurological, psychiatric, or medical disorders or any current medical problems. All experiments were run with each subject's written informed consent and according to the local ethics clearance (University College London, United Kingdom).

Experiment I

Subjects

Fourteen healthy right-handed adults (age range 19–32 years; mean = 24.9 years; SD = 3.95 years; seven males and seven females) with normal or corrected-to-normal acuity were recruited for paid participation.

Experimental design and task

Four 8-min blocks of a reward prediction paradigm were completed. Here, one of three possible cues (colored squares) appeared for 500 ms, informing the subjects which of two reward values could potentially follow. One cue predicted a high reward (£1) or a medium reward (£0.50); a second cue predicted a medium (£0.50) or a low reward (£0.15); and a third cue predicted a high (£1) or a low reward (£0.15) (Fig. 1A). One of the two possible reward values was presented with equal probability (50:50) 1,625–4,875 ms after the cue followed by a response/fixation period for 3,250–6,500 ms (Fig. 1B). Using their index or middle finger the subjects indicated whether they saw the higher or the lower of the two possible reward values (the finger, which had to be used to indicate the higher or lower reward and the color coding of the reward cues, were counterbalanced across subjects). The order of the stimuli and the timing (intertrial interval and interstimulus interval) were optimized for event-related fMRI allowing the hemodynamic responses for cues and outcome to be differentiated [Hinrichs et al.,2000].

thumbnail image

Figure 1. Experimental designs. One of three possible cues (colored squares) indicated either which of two possible reward values (Experiments I and II) (A, C) or which of two possible degrees of novelty associated with a scene picture (Experiment III) (D, E) could follow with equal probability. Following the reward (Experiments I and II) (AC) or scene picture (Experiment III) (D, E) subjects indicated whether they saw the higher or the lower of the two possible reward values (Experiments I and II), or the contextually more novel or familiar of two possible images (Experiment III). Additionally, in Experiment II, the reward value was followed by a novel scene picture, which had to be correctly classified as indoor or outdoor to receive the reward (C). fMRI was acquired during Experiments I and III. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

Prior to the experiment, the subjects were instructed to respond as quickly and as correctly as possible and were told that only correct responses would lead to the reward. After the experiment, the volunteers were paid 10% of all earnings, which could be a maximum of £16. Feedback as to correct responses was given after the completion of the entire experiment, but not on a trial-by-trial basis.

Experiment II

Subjects

Sixteen healthy right-handed adults (age range 19–33 years; mean = 23.63 years; SD = 4.22; seven males and nine females) with normal or corrected-to-normal acuity were recruited for paid participation.

Experimental design and task

This behavioral experiment was a modified version of Experiment I, run in front of a computer. One second after the reward a scene picture was presented, and subjects additionally had to discriminate its indoor/outdoor status (Fig. 1C). To maximize stimulus reward associations, and since fMRI was not being used, the timing between cue, reward, and picture was fixed. The subjects were told that they would receive the reward only if both decisions (higher vs. lower possible reward and indoor vs. outdoor) were made correctly.

One day later, the subjects performed an incidental recognition memory test following the “remember/know” procedure [Tulving,1985]. Here, the 288 previously seen pictures (48 pictures per category) were presented together with 48 new distracter pictures at the center of the screen, in random order. The subject first made an “old/new” decision to each individually presented picture using their right index or middle finger. Following a “new” decision, subjects were prompted to indicate whether they were confident (“certainly new”) or unsure (“guess”), again using their right index and middle finger. After an “old” decision, subjects were prompted to indicate if they were able to remember something specific about seeing the scene at study (“remember response”), just felt familiarity with the picture without any recollective experience (“familiar” response), or were merely guessing that the picture was an old one (“guess” response). The subject had 4 s to make each of both judgments, and there was a break of 15 s after every 84 pictures.

Experiment III

Subjects

Fourteen healthy right-handed adults (age range 19–39 years; mean = 26.1 years; SD = 5.7 years; nine females and five males) with normal or corrected-to-normal acuity were recruited for paid participation.

Experimental design and task

The experiment consisted of two phases that were performed while the subjects lay in the MRI scanner. fMRI was applied only during Phase II. In Phase I, the subjects were familiarized with 192 scene pictures: 96 pictures were shown twice and 96 pictures were shown four times. The pictures were presented for 1 s in random order, separated by a white fixation cross on gray background for 1 s (i.e. intertrial interval = 2 s). Half the scenes were indoor and half were outdoor, and subjects' task was to distinguish them as quickly and accurately as possible using their index or middle finger.

Following the design of Experiment I, in Phase II, four 8-min blocks of a novelty/familiarity-status prediction paradigm were completed. Here, one of three possible cues (colored squares) appeared for 500 ms, informing the subjects which two degrees of novelty of picture could potentially follow. One cue predicted a novel picture (not seen in Phase I) or a familiar picture (seen twice in Phase I); a second cue predicted a familiar (seen twice in Phase I) or a highly familiar picture (seen four times in Phase I); and a third cue predicted a novel picture (not seen in Phase I) or a highly familiar picture (seen four times in Phase I) (Fig. 1D). As in Experiment I, 1,625–4,875 ms after the cue one of the two possible pictures was presented with equal probability (50:50) followed by a response/fixation period for 3,250–6,500 ms (Fig. 1E). Using their index or middle finger the subjects indicated as quickly and accurately as possible whether they saw the more novel or the more familiar of the two possible pictures (the finger that had to be used to indicate the novelty/familiarity status and the color coding of the novelty-status cues was counterbalanced across subjects). As in Experiment I, the order of the stimuli and the timing (intertrial interval and interstimulus interval) in Phase II were optimized for estimation of event-related fMRI, allowing the hemodynamic responses for cues and outcome to be distinguished [Hinrichs et al.,2000].

The scenes presented in Experiments II and III were carefully prepared. The pictures were gray-scaled and normalized to a mean gray-value of 127 and a standard deviation of 75 (grayscale 8-bit). None of the scenes depicted human beings or parts of human beings including faces in the foreground. During Experiments I and III, the stimuli were projected onto the center of a screen and the subjects watched them through a mirror system mounted on the head coil of the fMRI scanner.

fMRI methods

fMRI was performed on a 3-Tesla Siemens Allegra magnetic resonance scanner (Siemens, Erlangen, Germany) with echo planar imaging (EPI) using a quadrature transceiver coil with a design based on the “birdcage” principle. In the functional session, 25 T2*-weighted images (EPI-sequence) per volume with blood oxygenation level-dependent (BOLD) contrast were obtained (matrix size: 64 × 64; 25 oblique axial slices per volume angled at −30° in the anteroposterior axis; spatial resolution: 3 × 3 × 3 mm; TR = 1,625 ms; TE = 30 ms; z-shimming prepulse gradient moment of PP = 0 mT*m−1*ms−1; positive phase-encoding polarity), which covered a partial volume of the brain including the prefrontal/orbitofrontal cortex, the basal forebrain, basal ganglia, the midbrain, temporal lobes, and the cerebellum (Supporting Information Fig. S5). The fMRI acquisition protocol was optimized to reduce susceptibility-induced BOLD sensitivity losses in inferior frontal regions and temporal lobe regions [Deichmann et al.,2003; Weiskopf et al.,2006]. For each subject, functional data were acquired in four scanning sessions containing 326 volumes per session. Six additional volumes per session were acquired at the beginning of each series and subsequently discarded from further analysis to allow for steady-state magnetization. Anatomical images of each subject's entire brain were collected by T1-weighted inversion recovery prepared EPI (IR-EPI) sequences (matrix size: 64 × 64; 64 slices; spatial resolution: 3 × 3 × 3 mm3), which were used to improve the spatial normalization of the partial fMRI volumes. Individual field maps were recorded using a double echo FLASH sequence (matrix size = 64 × 64; 64 slices; spatial resolution = 3 × 3 × 3 mm3; gap = 1 mm; short TE = 10 ms; long TE = 12.46 ms; TR = 1,020 ms) for distortion correction of the acquired partial EPI images [Weiskopf et al.,2006]. Using the “FieldMap toolbox” [Hutton et al.,2002,2004], field maps were estimated from the phase difference between the images acquired at the short and long TE.

The fMRI data were preprocessed and statistically analyzed using SPM5 software package (Wellcome Trust Centre for Neuroimaging, University College London, United Kingdom) and MATLAB 7 (The MathWorks, Natick, MA). All functional images were corrected for slice intensity differences with reference to the middle slice acquired in time; corrected for motion artifacts by realignment to the first volume; corrected for distortions based on the field map [Hutton et al.,2002]; and corrected for the interaction of motion and distortion using the “Unwarp toolbox” [Andersson et al.,2001; Hutton et al.,2004]. Subsequently, the partial volumes were spatially normalized to a standard T1-weighted SPM-template [Ashburner and Friston,1999] (care was taken that in particular midbrain regions aligned with the standard-template). This procedure was done by warping the subjects' anatomical IR-EPI to the SPM template and applying these parameters to all functional images. This procedure provides high accuracy of normalizing partial volume EPIs for several reasons: (1) each individual IR-EPI is a whole head image with the same spatial resolution and the same spatial position as each partial volume EPI-image, (2) in contrast to whole head T1 images IR-EPIs have almost identical distortions as functional EPIs, and (3) IR-EPIs have a higher contrast between gray and white matter as compared to functional EPIs. In a next step, the normalized partial EPIs were resampled to 2 × 2 × 2 mm3 and smoothed with an isotropic 4 mm full-width half-maximum Gaussian kernel. The time series fMRI data were high-pass-filtered (cutoff = 128 s), and an AR(1)-model was used to account for possible serial correlations. For each subject, a statistical model was computed by applying a canonical hemodynamic response function (HRF) combined with time and dispersion derivatives [Friston et al.,1998].

In both Experiments I and III, three regressors were modeled: cue onsets, novelty/reward onsets of correct trials, and incorrect trials. To capture residual movement-related artifacts, six covariates were additionally included (the three rigid-body translation and three rotations resulting from realignment) as regressors of no interest. To fit hemodynamic responses associated with reward/novelty outcomes to different effects, parametric modulation was applied [Buchel et al.,1998]. Here, on a trial-by-trial basis the regressor of reward/novelty related outcome activations were multiplied by (1) the adaptively scaled prediction error, (2) the absolute value/novelty status, or the (3) unscaled linear prediction error at each event and then convolved with the canonical HRF. The parametric modulator “adaptive scaling” (1) reflected a context-independent binary response (1, −1) to the higher or lower of the two possible rewards, or to the more novel or more familiar image, respectively; the parametric modulator “absolute coding” (2) reflected the absolute value of the reward outcome (1, 0.5, 0.15) or the absolute novelty status of a picture (as defined by the reciprocal of the number of repetitions; 1, 1/3, 1/5); the parametric modulator “linear prediction error” (3) reflected hemodynamic responses to reward/novelty outcomes based on the difference between the received reward and the mean expected reward as signaled by the cue (Experiment I) or the difference between the novelty status of the presented picture and the mean expected novelty as signaled by the cue (Experiment III). Thus, in Experiment I, the cue in Context 1 signaled a mean reward value of £0.75 [(£1 + £0.5)/2] and the outcome was associated with a prediction error of £+0.25 or £−0.25; in Context 2, the cue signaled a mean reward value of £0.325 [(£0.5 + £0.15)/2] and the outcome led to a prediction error of £+0.175 or £−0.175; in Context 3, the cue signaled a mean reward value of £0.575 [(£1 + £0.15)/2] and the outcome led to a prediction error of £+0.425 or £−0.425. In Experiment III, the cue in Context 1 signaled a mean novelty signal of 2/3 [(1 + 1/3)/2] and the outcome was associated with a prediction error of +1/3 or −1/3; in Context 2, the cue signaled a mean novelty status of 0.2667 [(1/3 + 1/5)/2] and the outcome was associated with a prediction error of +0.0667 or −0.0667; in Context 3, the cue signaled a mean novelty status of 0.6 [(1 + 1/5)/2] and the outcome led to a prediction error of +0.4 or −0.4. In SPM5, each parametric modulator was mean-corrected and orthogonalized with respect to the first parametric modulator (that means the second parametric modulator was orthogonalized with respect to the first and the third parametric modulator was orthogonalized with respect to the first and second). Thus, any shared variance between the three regressors was assigned to the first regressor. We started our analysis with this model to give the “scaled adaptive coding” regressor most explanatory power. Additionally, we report a more conservative model in which “scaled adaptive coding” was entered as the last parametric modulator (following “linear prediction error” and “absolute coding”).

The specific effects of each parametric modulator were tested by entering the parameter estimates (regression coefficients) resulting from each condition (parametric modulator) and subject (first-level analysis) into a second-level random-effects group analysis using one-way ANOVA (thresholded at P = 0.005 for both experiments, uncorrected). This also allowed to extract comparable parameter estimates (fits) for each parametric modulator. Brain regions showing both scaled adaptive coding of reward and scaled adaptive coding of novelty were detected by entering the corresponding contrast images into between-subject ANOVA and applying inclusive masking procedures (P = 0.005, uncorrected). Furthermore, for both experiments, a one-way ANOVA was used to assess the hemodynamic effects associated with cue processing. Here, two effects are reported: one expressing the main effect of cue processing and the second expressing activations associated with the mean expected reward value (Experiment I) and mean expected novelty status (Experiment III) (thresholded at P = 0.005, uncorrected). This relatively liberal threshold was chosen based on our regionally specific a priori hypotheses, which are also expressed in the acquisition of only a partial volume (covering MTL, midbrain, basal ganglia, and frontal cortex; Supporting Information Fig. S5). Furthermore, we limited our search (and reported results) to a priori defined regions of interest (ROIs), which included bilateral (1) medial temporal lobe (including hippocampus and rhinal cortex), (2) striatum (including caudate ncl., pallidum, and putamen), (3) midbrain (including SN/VTA), (4) OFC, and (5) medial PFC.

The anatomical localization of significant activations was assessed with reference to the standard stereotaxic atlas by superimposition of the SPM maps on a standard brain template (Montreal Neurological Institute) provided by SPM5. To verify the anatomical localization of structures within the midbrain the activations maps were superimposed on a magnetization transfer (MT) template, which was derived from averaging normalized MT-images of 33 young adults [Bunzeck and Duzel,2006]. On MT-images, the SN/VTA region can be distinguished from surrounding structures as a white stripe, while the adjacent red nucleus appears dark [Bunzeck and Duzel,2006; Bunzeck et al.,2007]. Note that we prefer to use the term SN/VTA and consider BOLD activity from the entire SN/VTA complex for several reasons [Duzel et al.,2009]. Unlike early formulations of the VTA as an anatomical entity, different dopaminergic projection pathways are dispersed and overlapping within the SN/VTA complex. In particular, DA neurons that project to the limbic regions and regulate reward-motivated behavior are not confined to the VTA, but they are distributed also across the SN (pars compacta) [Gasbarri et al.,1994,1997; Ikemoto,2007; Smith and Kieval,2000]. Functionally, this is paralleled in the fact that in humans and primates DA neuron within the SN and VTA respond to both reward and novelty (see for instance Ljungberg et al. [1992] or Tobler et al. [2003] for a depiction of recording sites).

RESULTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES
  8. Supporting Information

Experiment I

Behaviorally, subjects displayed rapid responding (RT < 900 ms) and high accuracy (d' > 4.4) (for behavioral results, see Supporting Information Table S1). We analyzed neural activity associated with reward outcomes using parametric modulation [Buchel et al.,1998]. This approach allows hemodynamic responses to be fitted to the different parameters of the experiment. Here, we used three parametric modulators to identify brain regions (1) expressing an adaptively scaled prediction error response (adaptive scaling), (2) correlating with the absolute value of the reward outcome (absolute coding), or (3) responding according to the unscaled linear prediction error (see Fig. 2 and Materials and Methods). We started our analysis with this model to give the “scaled adaptive coding” regressor most explanatory power.

thumbnail image

Figure 2. Parametric modulation. In both fMRI experiments, hemodynamic responses at outcome of reward (left, Experiment I) and novelty (right, Experiment III) were analyzed using parametric modulation. The parametric modulators tested expressed “scaled adaptive coding,” “absolute coding,” and “linear prediction error.” The parametric modulator of “adaptive scaling” reflects a binary response (in other words a scaled prediction-error signal) to the higher versus lower of the two possible rewards, or more novel versus more familiar of images; the parametric modulator for “absolute value” reflects either the absolute value of the reward outcome or the absolute novelty status of a picture (as defined by the reciprocal of the number of repetitions); the parametric modulator of a “linear prediction error” reflects the unscaled prediction error responses to reward/novelty outcomes based on the difference between the received reward and the mean expected reward as signaled by the cue (Experiment I), or the difference between the novelty status of the presented image and the mean expected novelty status as signaled by the cue (Experiment III; see Materials and Methods). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

We found that adaptive scaling of reward was expressed in bilateral ventral striatum, bilateral MTL including hippocampus and rhinal cortex, and bilateral medial prefrontal cortex (mPFC) including sectors of orbital prefrontal cortex (mOFC) (see Fig. 3) (for a full list of activated brain regions, see Supporting Information Table S3). Although activity in parts of the mPFC, OFC, and MTL also correlated with the actual reward level, the linear prediction error, or both (see Supporting Information Table S3), it should be noted that the reported activations for all three parametric modulators were not overlapping as a result of exclusive masking procedure.

thumbnail image

Figure 3. fMRI results (Experiment I). Hemodynamic responses for reward outcome expressed scaled adaptive coding in medial temporal lobes including hippocampus and rhinal cortex (A), bilateral ventral striatum (B), and the medial and orbital prefrontal cortex (C). Note that parameter estimates reflect the fit of the different parametric modulators (“scaled adaptive coding,” “absolute coding,” “linear PE”); within the depicted regions only the regressor for “scaled adaptive coding” significantly explained variance of the hemodynamic responses (greater than zero, P < 0.05; error bars denote one standard error of the mean). Furthermore, direct comparison shows significant differences between parametric modulators expressing “scaled adaptive coding” and “linear PE” or “absolute coding” (P < 0.05, two-tailed). Activation maps were superimposed on a T1-weighted MNI standard brain, coordinates are given in MNI space, and color bar indicates T-values. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

In a second model, “adaptive scaling” was entered as last parametric modulator (following “linear prediction error” and “absolute coding,” see Materials and Methods); this model also revealed activations for “adaptive scaling” of reward in ventral striatum, hippocampus, and prefrontal cortex (Supporting Information Fig. S1). The hemodynamic responses associated with cue processing are presented as Supporting Information.

The results of Experiment I reveal that components of the human reward processing circuitry, specifically the nucleus accumbens, mOFC, and mPFC [O'Doherty,2004], show characteristic features of adaptation previously described in the activity of DA neurons as measured in nonhuman primates [Tobler et al.,2005]. As for the linear temporal difference prediction error, only the higher of two predicted rewards engendered enhanced neural activation, whilst the lower of the two possible rewards led to decreased activation (see also Cromwell et al. [2005], Elliott et al. [2008], Nieuwenhuis et al. [2005], Tremblay and Schultz [1999], and Tsujimoto and Sawaguchi [2007]). This pattern suggests that the same reward value (£0.50) was associated with a hemodynamic decrease in one context and an increase in another. More importantly, the magnitude of the hemodynamic response to reward outcome did not depend on the actual difference between the mean expected reward and the actual reward (linear prediction error), but was scaled to remain similar across contexts. This replicates in humans a previous finding from monkeys of a change in gain and sensitivity during reward processing determined by the range of likely reward prediction errors [Tobler et al.,2005]. Together, these two findings provide evidence for scaled adaptive coding as a mechanism of reward processing in humans.

The finding that MTL structures, including both hippocampi and rhinal cortex, expressed scaled adaptive processing to reward outcome (Fig. 3A) extends an observation that MTL contributes to certain forms of reinforcement learning [Devenport et al.,1981; Holscher et al.,2003; Ploghaus et al.,2000; Purves et al.,1995; Rolls and Xiang,2005; Tabuchi et al.,2000; Weiner,2003] and strengthens links between reinforcement learning and declarative long-term memory in humans [Adcock et al.,2006; Wittmann et al.,2005]. On the basis of this finding, we next tested whether scaled adaptive processing of reward in MTL would impact long-term memory encoding of novel information when presented in conjunction with a reward outcome (Experiment II).

Experiment II

During encoding, subjects responded quickly and accurately during this task (see Supporting Information Table S1). Recognition memory analysis was based on both hits (remember responses and know responses following pictures previously seen during encoding) and false alarms ([FA]: remember or know responses to distracters). Proportions of “remember” and “know” responses to novel and old pictures of each context as well as remember estimates (Rcorr) and familiarity estimates (Fcorr) are listed in Supporting Information Table S2. Both Rcorr and Fcorr were calculated based on a model according to which recollection and familiarity are independent processes [Yonelinas and Jacoby,1995]. According to this model, remember estimates are “remember-responses” corrected for false-alarms and familiarity estimates are derived by dividing proportions of “know” responses (corrected for false alarms, Kcorr) by 1 − Rcorr [Fcorr = Kcorr/(1 − Rcorr)] [Yonelinas and Jacoby,1995]. Rcorr and Fcorr were submitted to a 3 × 2 × 2 ANOVA with the factors “reward context” (high–medium; medium–low; high–low), “relative reward outcomes” (higher of two possible rewards and lower of possible rewards), and “recognition memory” (Rcorr, Fcorr). This analysis revealed a significant main effect of “relative reward outcome” [F(1,15) = 12.83, P = 0.003] in the absence of any other main effect or interaction (all P > 0.3). Post hoc t-test across contexts and Rcorr/Fcorr showed a significantly higher memory performance for images that followed the higher of two possible rewards compared to images following the lower possible reward (T = 3.6, P < 0.005). Thus, consistent with our demonstration of adaptively scaled hemodynamic responses, memory was significantly better for images associated with the higher of two possible outcomes both for remember and know responses, an effect that was independent of the actual value of reward (see Fig. 4). An analysis using Rcorr and Kcorr revealed similar results, i.e. a main effect of “relative reward outcomes” (P < 0.005) in the absence of any other main effects or interactions (P > 0.4).

thumbnail image

Figure 4. Recognition memory performance 1 day after encoding (Experiment II). The corrected hit-rate (corrected remember estimates + corrected familiarity estimates) for novel pictures followed the pattern of adaptive scaling. Medium reward value (£0.50) improved learning when it was the higher possible reward compared to when it was the lower possible reward (indicated by the asterisk, P < 0.05, two-tailed). Error bars denote one standard error of the mean.

Download figure to PowerPoint

Experiment III

Although subjects responded more slowly and less accurately than in Experiments I and II, they nevertheless discriminated relatively more novel from relatively more familiar pictures across all contexts (see Supporting Information Table S1 for RTs, hit-rates, FA-rates, and d'-values). By analogy with Experiment I, we analyzed the hemodynamic effects of novelty using parametric modulation with three modulators. The first parametric modulator represented an adaptively scaled prediction error or in other words a binary response to contextual novelty; the second parametric modulator represented the absolute novelty status (as defined by the reciprocal of the number of repetitions), and the third parametric modulator represented the unscaled or linear prediction error (see Fig. 2 and Material and Methods). As in Experiment I, we started our analysis with this model to give the “scaled adaptive coding” regressor most explanatory power.

An effect of scaled adaptive coding for novelty was expressed in MTL regions including the hippocampus, rhinal cortex, prefrontal cortex including orbital parts (mOFC) (see Fig. 5), and midbrain (including lateral parts of the SN/VTA, Supporting Information Fig. S3; Supporting Information Table S4 for a complete list of activated regions). As in Experiment I, although absolute novelty and the linear prediction error were also expressed in regions including the medial temporal lobe and prefrontal cortex (Supporting Information Fig. S4 and Supporting Information Table S4), it is important to note that the activation patterns for the three parameters did not overlap as a result of exclusive masking procedure. In a second model, “adaptive scaling” was entered as last parametric modulator (following “linear prediction error” and “absolute coding” of novelty); this model also revealed activations for “adaptive scaling” of novelty in bilateral hippocampus and orbital prefrontal cortex (Supporting Information Fig. S1). The hemodynamic responses associated with cue processing are available as Supporting Information.

thumbnail image

Figure 5. fMRI results (Experiment III). Hemodynamic responses for novelty outcome expressed scaled adaptive coding within the hippocampus (A), rhinal cortex (B), and orbital parts of the mPFC (C). Parameter estimates reflect the fit of the different parametric modulators (“scaled adaptive coding,” “absolute coding,” “linear PE”); within the depicted regions, only the regressor for “scaled adaptive coding” significantly explained variance of the hemodynamic responses (indicated by asterisk). Direct comparison shows significant differences between parametric modulators expressing “scaled adaptive coding” and “linear PE” or “absolute coding” (P < 0.05, two-tailed; except “adaptive scaling” vs. “linear PE” in rhinal cortex, P < 0.05, one-tailed). [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

Finally, based on the first model, we performed a second-level ANOVA for both novelty and reward to identify common brain regions that expressed scaled adaptive coding of reward and scaled adaptive coding of novelty. Commonly activated brain regions were identified by using implicit masking (i.e. voxels activated for scaled adaptive coding of reward were implicitly masked by those that were activated for scaled adaptive coding of novelty); this analysis identified regions within the MTL including the hippocampus and rhinal cortex and also orbital–medial PFC (Fig. 6, Supporting Information Table S5). In none of these regions did we observe a correlation between individual d'-values and the fit of the adaptive scaling model (β-values; P > 0.05). On the other hand, the contrast for (1) absolute coding of reward and absolute novelty status, and (2) the linear prediction error of reward and the linear prediction error of novelty did not reveal any commonly activated voxels within the a priori defined brain regions.

thumbnail image

Figure 6. Common areas expressing scaled adaptive coding of reward and novelty. As identified by between-subject ANOVA and implicit masking, overlapping regions within the hippocampus (A), rhinal cortex (B), and an orbital part of the mPFC (C) expressed scaled adaptive coding of reward as well as novelty. Note that parameter estimates reflect the fit of the parametric modulator (“scaled adaptive coding”). As indicated by the asterisk within the depicted regions, only the parametric modulator expressing “scaled adaptive coding” significantly explained variance of the hemodynamic responses. Error bars denote one standard error of the mean. [Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

Download figure to PowerPoint

DISCUSSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES
  8. Supporting Information

Our data confirm the hypothesis that, as for reward magnitude, the neuronal response to novelty is scaled adaptively as a function of contextual predictions. In particular, hippocampus, rhinal cortex, and orbital–medial PFC participated in scaled adaptive coding of both reward and novelty. This anatomical overlap was notably absent for signals reporting absolute novelty or the linearly coded prediction error for novelty (the deviation of stimulus novelty from contextual predictions).

The common participation of these three regions in adaptive scaling is consistent with the well-established functional and anatomical connectivity of these regions. All these regions interact functionally through monosynaptic projections from the hippocampus to the medial PFC and from medial PFC to the rhinal cortex, the main input and output gateway for the hippocampus [Miller and Cohen,2001; Wallis,2007]. Our data demonstrate that this network adapts its gain and sensitivity during reward and novelty processing in a manner that accords with the statistics of likely prediction errors [Tobler et al.,2005].

Remarkably, this adaptive scaling within the MTL-mPFC network also translated directly into measurable effects on long-term memory performance. The first two experiments showed that scaled adaptive processing of reward was expressed behaviorally in the modulation of recollection and familiarity, which are processes that are critically dependent on the integrity of these regions [Duzel et al.,2001; Eichenbaum et al.,2007; Mishkin et al.,1998; Squire et al.,2004; Yonelinas et al.,2002]. Hence, memory performance did not reflect the activity pattern of brain regions that represented the unscaled difference between predicted and actual reward or those reflecting an absolute expression of reward magnitude (Supporting Information Table S3). This observation supports the idea that a hippocampal/rhinal contribution to long-term memory for novel images in Experiment II was also scaled adaptively according to an overall response to the preceding reward outcomes (observed in Experiment I). Our results indicate that adaptively scaled mismatch signals in the hippocampus and rhinal cortex are privileged in their control over learning. That means memory improvement is more closely related to whether rewards are higher or lower than predicted within a particular context rather than by how much they deviate from predictions.

Our observation that the hippocampus participates in adaptive scaling of reward is not entirely unexpected. Although often overlooked, there is long-standing evidence that the hippocampus contributes to certain forms of reinforcement learning [Devenport et al.,1981; Holscher et al.,2003; Ploghaus et al.,2000; Purves et al.,1995; Rolls and Xiang,2005; Tabuchi et al.,2000; Weiner,2003]. For instance, the rodent hippocampus shows increased activity in baited but not unbaited maze arms [Holscher et al.,2003]; in nonhuman primates, it is involved in learning place reward associations [Rolls and Xiang,2005]; hippocampal activity follows prediction error learning rules for aversive stimuli in humans [Ploghaus et al.,2000]; and reward increases synchronization between hippocampus and nucleus accumbens neurons [Tabuchi et al.,2000]. Furthermore, lesioning the hippocampus impairs contextual forms of latent inhibition [Purves et al.,1995] and leads to excessive responses or repetitive behavior in instrumental learning tasks [Devenport et al.,1981]. Our present results extend these findings by showing that the expression of a reward signal within the MTL can be adaptively scaled with direct impact on reward-related modification of hippocampal learning [Adcock et al.,2006; Wittmann et al.,2005].

To accomplish adaptive scaling, the hippocampal-rhinal-OFC network is likely to receive information about actual magnitudes. In the case of reward, these are likely to derive from the medial superior frontal gyrus (Supporting Information Table S3), a finding that is compatible with animal studies showing that dorsal portions of PFC are capable of representing specific (and therefore probably absolute) properties of reward [Sakagami and Watanabe,2007]. In the case of novelty, information about actual magnitudes is also present within the hippocampus (Supporting Information Fig. S4 and Supporting Information Table S4). In fact, the intrinsic functional architecture of the hippocampus may be capable of generating novelty signals by comparing incoming information in hippocampal subregion CA1 against contextual predictions about likely events generated in CA3 [Hasselmo and Wyble,1997; Lisman and Grace,2005]. Our data suggest that through its connectivity with the rhinal cortex and the OFC, this hippocampal architecture also generates adaptively scaled novelty responses. Whether or not the hippocampus plays a similar role in generating reward responses is much less clear. The important finding here is that MTL-dependent memory formation mirrors the adaptively scaled reward signal in the hippocampus and rhinal cortex (see Fig. 4).

Adaptive coding responses for novelty were adjacent to absolute coding and linear prediction error responses in the right hippocampus (Supporting Information Fig. S4). Such anatomical proximity of qualitatively different response patterns in the hippocampus has been reported before. For instance, in a recent study of recognition memory, adjacent hippocampal regions either responded with linearly increasing repetition suppression to increasing levels of familiarity or distinguished in a binary fashion novel from familiar stimuli (irrespective of the level of familiarity) [Johnson et al.,2008]. One converging possibility is that absolute coding of familiarity strength is a retrieval process that contributes to recognition memory (discrimination of old and new items), whereas adaptive coding is an encoding mechanism that modulates how much is learned from repeated or novel events [Duzel et al.,2003; Johnson et al.,2008]. In fact, retrieval and encoding processes in the MTL are likely to be closely interrelated [Hasselmo,2005], and behavioral evidence for this suggestion is the so-called “testing” phenomenon: this refers to the observation that the mere retrieval of memories is an efficient mechanism to improve episodic memory [Buckner et al.,2001; Chan and McDermott,2007; Karpicke and Roediger,2008]. The extension that our findings suggest is that encoding is not just a mirror image of retrieval processes, in the sense that familiar stimuli elicit encoding activity that decreases proportionally with the repetition suppression they elicit. Rather, two stimuli that elicit different levels of repetition suppression in the MTL can engage the same magnitude of MTL encoding responses if they are similar in their degrees of relative novelty in their own respective contexts. By this token, adaptive coding of novelty regulates the likelihood of encoding events with different levels of novelty by re-expressing their medio-temporal repetition effects scaled according to contextual expectations and probabilities. These contextual scaling mechanisms thus qualitatively extend the “novelty encoding hypothesis” [Tulving et al.,1996].

Although we have approached adaptive scaling of novelty in the context of a mesolimbic mechanism of encoding, there is also a possibility that it contributes to recognition memory by enhancing the gain of neural response differences between old and new stimuli. There is already evidence that qualitatively different neural repetition effects within rhinal cortex, repetition suppression [Brown and Aggleton,2001; Brown and Xiang,1998; Gonsalves et al.,2005; Johnson et al.,2008; Miller and Desimone,1994; Sauvage et al.,2008], and repetition enhancement (so-called match enhancement) [Duzel et al.,2003; Kumaran and Duzel,2008; Miller and Desimone,1994; Shohamy and Wagner,2008] can support recognition memory. By this token, adaptive scaling could be a result of repetition enhancement to the more novel of two possible outcomes and repetition suppression to the less novel outcome. Electrophysiological recordings in nonhuman primates suggest that both suppression and enhancement responses can come early enough (80 ms after the onset of a visual stimulus) [Miller and Desimone,1994] to support even very rapid recognition memory judgments (see also Bunzeck et al. [2009]). However, this possibility is not supported by our data because adaptively scaled novelty signals from the hippocampus and the rhinal cortex were not correlated with discriminability (d-prime) of old and new items across participants. Instead, the d-prime values in the three novelty contexts (0.5, 0.2, and 0.76) paralleled quite closely the regressor coding the absolute novelty difference between stimuli in the three contexts (0.5, 0.35, and 0.8).

As an encoding mechanism, adaptive scaling of novelty is unlikely to be called into play in all situations in which novel and familiar information is encountered in the same context. We observed in an earlier study [Bunzeck and Duzel,2006] that when stimuli in a given context can be treated as single, unrelated, events that do not need evaluation with respect to contextual predictions, and for which such interstimulus relationships are not goal-relevant, then veridical (absolute) coding alone dominates. Such a situation is present when the memory difference between old and new items can be ignored (is incidental) because task demands require for instance a semantic decision on each item (e.g., is this an indoor or an outdoor scene?). In addition, under such conditions, novelty may even act as a mechanism to enhance long-term memory for familiar items [Bunzeck and Duzel,2006; Fenker et al.,2008; Li et al.,2003]. By contrast, when, as in the current study, assessment, encoding, and retrieval operations go hand in hand because the encoding of novelty follows an explicit evaluation of relationships with respect to other stimuli in the same context, scaled coding mechanisms should come into play.

It is conceivable that the mesolimbic and mesocortical structures identified might be involved whenever adaptive scaling is necessary for comparisons, and not just in the case of novelty and reward. However, this seems unlikely, since these regions of interest were determined based on a wealth of previous studies that specifically focused on novelty and reward processing. In fact, the intraparietal sulcus (IPS) (not scanned in our fMRI studies), for instance, has been linked with numerical magnitude processing and comparisons of angles, lines, physical size of stimuli, and stimulus brightness [Cohen Kadosh et al.,2005; Fias et al.,2003], which led to the assumption of a common IPS-based coding mechanism of magnitude and quantities [Cohen Kadosh et al.,2008; Walsh,2003]. Although to our knowledge there is no direct evidence that the IPS expresses adaptively scaled magnitude signals, it has been suggested that its function in magnitude estimation is strongly linked with underlying response-selection [Gobel et al.,2004; Walsh,2003]. An interesting possibility is that the IPS may link as a general purpose metric assessment device to mesolimbic and mesocortical structures, which then compute their contextual adaptive motivational value. For instance, it is known that the size of place fields coded within the hippocampus is dependent on the size of the environment [Ahmed and Mehta,2009]. If such contextual coding is related to the phenomenon we observed here is unclear.

Taken together, our current findings and earlier observations [Bunzeck and Duzel,2006] regarding the magnitude of novelty signals show a remarkable parallel with nonhuman primate studies of reward processing in the SN/VTA [Tobler et al.,2005]. In the absence of any explicit predictive stimuli, the activation of DA neurons increased with the reward value of unpredicted liquids, suggesting absolute coding of reward magnitude, whereas explicit predictions about contextual probabilities of rewards led to adaptively scaled representations of reward magnitude. Note that our observations regarding adaptive scaling were outside of the SN/VTA. Although predicted, we did not obtain convincing evidence that the SN/VTA distinguished between adaptive scaling, absolute coding, or a response reflecting a linear prediction error for reward or novelty (there was an adaptively scaled novelty response within the midbrain comprising lateral parts of the SN/VTA complex, which we cannot confidently relate to SN/VTA; Supporting Information Fig. S3). One reason for this negative finding may have been that, as in nonhuman primates [Tobler et al.,2005], only a proportion of reward (and novelty) responsive SN/VTA neurons showed adaptive scaling and that these selectively responding neural populations were spatially too close to each other to be distinguished with fMRI.

Adaptive scaling is relevant in both highly familiar and novel environments when comparative novelty among stimuli is important. In well-known environments, e.g., the daily route to work, most stimuli are predicted to be highly familiar. In this case, neural responses to small and subtle changes to the environment will be adaptively enhanced, enabling rapid updating of existing knowledge and helping maintain behavioral flexibility. On the other hand, in environments that are unknown, e.g. when exploring a new city, leading to a default prediction of novel events, adaptive reductions of novelty responses can filter out excess information, thereby helping minimize interference. Susceptibility to interference and failure to update existing knowledge accompany a number of neurological and psychiatric conditions such as OFC lesions [Miller and Cohen,2001; Wallis,2007], schizophrenia [Servan-Schreiber et al.,1996], and age-related memory dysfunction accompanying minimal cognitive impairment (MCI) [Della Sala et al.,2005]. As a mechanism that allows efficient memory updating and filtering out excess information, adaptive coding offers a quantitative framework, which might help provide a better understanding of clinical symptoms and syndromes.

While emphasizing the remarkable anatomical overlap for the adaptive coding of novelty and reward within hippocampus, rhinal cortex, and OFC, we are not arguing that novelty and reward are equivalent psychological constructs. For example, recent evidence shows that SN/VTA responses to reward and novelty correlate with different personality traits [Krebs et al.,2008]. As expected, a number of brain regions, most notably the nucleus accumbens, responded differently to novelty and reward. In fact, there is a subtle, but important difference between the roles suggested for the adaptive scaling of reward and novelty. To the extent that adaptive scaling of reward prediction errors constitutes an efficient neural code, we may expect downstream mechanisms to invert it in working out how to adapt predictions of future reward. If this inversion is imperfect, then the predictions will be incorrect, a fact that might underlie some anomalies of appetitive choice [Kahneman and Tversky,1979]. However, we show directly that adaptive scaling of reward is not inverted in this way, such that the encoding of stimuli into memory is indeed dependent on relative rather than absolute reward. We have also discussed some ideas as to why adaptive scaling of novelty might be appropriate. However, this leaves open the issue as to what happens when novelty itself is treated as being rewarding, as in exploration bonuses [Kakade and Dayan,2002] inspiring increased exploratory behaviors, an outcome that is critically linked to hippocampal integrity [Flicker and Geyer,1982; Honey et al.,1998; Knight,1996; Sokolov,1963; Vinogradova,2001]. Here, adaptive scaling would allow the relative allocation of exploration among better or worse known parts of an environment to be separated from the absolute question as to how much to explore as a whole.

REFERENCES

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES
  8. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. REFERENCES
  8. Supporting Information

Additional Supporting Information may be found in the online version of this article.

FilenameFormatSizeDescription
HBM_20939_sm_suppfigs.doc2169KSupporting Figures

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.