It has recently been suggested that learning signals in the amygdala might be best characterized by attentional theories of associative learning [such as Pearce–Hall (PH)] and more recent hybrid variants that combine Rescorla–Wagner and PH learning models. In these models, unsigned prediction errors (PEs) determine the associability of a cue, which is used in turn to control learning of outcome expectations dynamically and reflects a function of the reliability of prior outcome predictions. Here, we employed an aversive Pavlovian reversal-learning task to investigate computational signals derived from such a hybrid model. Unlike previous accounts, our paradigm allowed for the separate assessment of associability at the time of cue presentation and PEs at the time of outcome. We combined this approach with high-resolution functional magnetic resonance imaging to understand how different subregions of the human amygdala contribute to associative learning. Signal changes in the corticomedial amygdala and in the midbrain represented unsigned PEs at the time of outcome showing increased responses irrespective of whether a shock was unexpectedly administered or omitted. In contrast, activity in basolateral amygdala regions correlated negatively with associability at the time of cue presentation. Thus, whereas the corticomedial amygdala and the midbrain reflected immediate surprise, the basolateral amygdala represented predictiveness and displayed increased responses when outcome predictions became more reliable. These results extend previous findings on PH-like mechanisms in the amygdala and provide unique insights into human amygdala circuits during associative learning.
Prediction errors (PEs; the differences between expected and received outcomes) serve different functions across formal learning models. Rescorla–Wagner (RW) models are often referred to as unconditioned stimulus (US) processing models, because associative change directly depends on changes of signed PEs (Rescorla & Wagner, 1972). Attentional learning models, in contrast (Mackintosh, 1975; Pearce & Hall, 1980), are known as conditioned stimulus (CS) processing models as error signals within these models only indirectly affect learning by modulating the attention to the CS. In these models, the unsigned PE (its absolute value) serves as a measure of how surprising an outcome occurs and determines the effectiveness of a cue to be associated with a certain outcome (a property known as associability). More recent accounts have suggested hybrid learning models based on the idea of combining former CS and US processing models (Le Pelley, 2004). Here, PEs drive learning as in the RW model, but learning rates are changed dynamically by the cue's associability.
At the neural level, a recent functional magnetic resonance imaging (fMRI) study (Li et al., 2011) has suggested that amygdala responses during aversive learning might be best described by computational signals derived from such hybrid models. Additionally, studies in rodents and monkeys have reported unsigned Pearce–Hall (PH)-like PEs and similar surprise signals in the amygdala and dopaminergic midbrain (Matsumoto & Hikosaka, 2009; Calu et al., 2010; Roesch et al., 2010). However, previous studies investigating PH-like learning signals in humans are rare and did not disentangle signals in the amygdala related to CS and US processing. Here, we employed an aversive Pavlovian reversal-learning task in a paradigm that allowed for separate assessment of CS and US responses, and combined this approach with high-resolution fMRI to investigate the contribution of amygdala subregions.
In a first step, we tested whether an RW/PH hybrid learning model provides a more accurate explanation of behaviour than a simple RW model. In a second step, learning signals derived from the hybrid model were correlated with neuronal activity to identify a representation of the unsigned PE at the time of outcome and a representation of associability at the time of cue presentation. The unsigned PE reflects a signal of immediate surprise, whereas associability represents the reliability of prior outcome predictions and is used to update current outcome expectations. Both signals were expected to be represented in the amygdala as this brain region is known to play a crucial role in mediating vigilance and attention (Davis & Whalen, 2001), and in learning to predict aversive outcomes (Schiller et al., 2008; Eippert et al., 2012). Additionally, we tested for unsigned PE effects in the midbrain as recent animal studies suggest that the surprise-induced enhancement of learning depends on the integrity of midbrain–amygdala connections (Lee et al., 2006, 2008, 2010).
Materials and methods
Twenty-two healthy male subjects (mean age 26.9 years, range 21–33 years) participated in this study. Due to equipment malfunction one subject had to be excluded from further analysis. Only male volunteers were enrolled in the study to reduce variability in amygdala activation based on gender effects in conditioning and other emotional tasks (Milad et al., 2006; Cahill, 2010). The study was approved by the Ethics Committee of the Medical Board in Hamburg. All participants gave written informed consent and were paid for their participation.
The paradigm used was a classical Pavlovian delay-conditioning procedure including an acquisition and a reversal phase. Reversal of cue–outcome associations provided a characteristic test to assess associability as the outcomes became surprising again with the beginning of the reversal phase (Holland & Gallagher, 1999; Li et al., 2011). Abstract fractal images served as visual CSs and the US was an electric shock (150 ms duration, 20 pulses/s of 0.01–100 mA) delivered to the dorsum of the right hand. Before the experiment, the intensity of the US was individually adjusted to be aversive. For this purpose, volunteers received shocks of varying intensity. They rated the aversiveness of the shocks on a rating scale ranging from 1 (not aversive) to 10 (very aversive and not tolerable). The intensity of the shock that received the rating score 9 (very aversive, but still bearable) was selected as the final intensity administered throughout the experiment.
We used three different visual cues, each presented 40 times throughout the entire experiment. During acquisition, cue A co-terminated with a shock in 20/20 occasions (100% reinforcement, CS100), cue B was paired with shock in 10/20 occasions (50% reinforcement, CS50) and cue C was never followed by the US (CS–). Thus, the acquisition phase comprised the first 20 occurrences of each cue. In the reversal stage, the CS100 was not paired with shock (new CS–), whereas the CS50 was followed by a shock in every trial (new CS100) and the CS– was followed by a shock on 50% of the occasions (new CS50). The reversal phase immediately followed the acquisition phase. It started after a fixed number of trials such that the 61st trial was always the first trial of the reversal phase. Again, each cue was presented 20 times leading to a total experimental time of approximately 40 min. The assignment of the different images to the respective contingency condition was counterbalanced across subjects and the trial order was pseudorandomized for each subject (no more than two consecutive reinforced trials).
Each CS was presented for 10 s during the experiment and the US was administered at 9.85 s in paired trials and co-terminated with the CS (Fig. 1A). At 7 s after CS onset, subjects performed an expectancy rating, which consisted of judging the likelihood of US delivery on a discrete scale. For this purpose, three symbols (−, ? and +) appeared underneath the fractal images for 2 s and participants rated the likelihood via button press according to the symbols' meaning (“no shock”, “maybe shock”, “shock”). The intertrial interval varied randomly between 9 and 11 s across trials. During the intertrial interval a fixation cross was shown on the screen.
The instructions were to concentrate on the images and complete the expectancy rating spontaneously in every trial. Subjects were aware that their responses did not influence the likelihood of US delivery. Before the experiment, subjects were familiarized with the task using different fractal images without US presentation. Subjects were not informed about the two experimental phases prior to the experiment and the reversal stage started immediately after acquisition without announcement.
Task presentation and recording of behavioural responses were performed with the software Presentation (Neurobehavioral Systems, Albany, CA, USA). Skin conductance responses (SCRs) were recorded using a constant voltage system with Ag/AgCl electrodes placed on the hypothenar eminence of the left hand. Responses were amplified and digitized at 1000 Hz (CED 2502 and micro 1401, Cambridge Electronic Design, Cambridge, UK).
A 3 Tesla system (TRIO; Siemens, Erlangen, Germany) equipped with a 32-channel head coil was used for acquisition of the fMRI data. Thirty-six transversal slices (slice thickness, 1.5 mm; no gap) were obtained in each volume using a high-resolution T2*-sensitive gradient echo-planar imaging sequence (repetition time, 2680 ms; echo time, 30 ms; flip angle, 80°; field of view, 219 × 219 mm; in-plane resolution, 1.5 × 1.5 mm; parallel imaging with acceleration factor 2). Functional image coverage included the medial temporal lobe, parts of the prefrontal cortex and brainstem areas. High-resolution T1-weighted anatomical images (1 × 1 ×1 mm³) were also acquired using a magnetization prepared rapid gradient echo sequence.
We acquired four sessions consisting of between 210 and 250 volumes each, to sustain optimal quality of the high-resolution fMRI data. The sessions succeeded with the shortest possible latency (40–50 s) and the experimental presentation was not interrupted during scanner breaks in order to assure continuous learning unbiased by attentional changes caused by experimental breaks. Subjects were told beforehand that the scanner breaks were due to technical issues and had no meaning for the experimental task. No break occurred in close temporal proximity to the beginning of the reversal phase. All trials occurring during a scanner break (or during the acquisition of the first four volumes of the subsequent session) were discarded from further analysis of the imaging data. As the trial order was randomized, the condition assignment of discarded trials differed between subjects.
Behavioural and skin conductance data analysis
Expectancy ratings were coded to values of 0 (no shock), 0.5 (maybe shock) and 1 (shock). Skin conductance data from two subjects were discarded due to poor signal quality. Data from the remaining subjects were downsampled to 10 Hz and low-pass filtered (cutoff frequency 1 Hz) to remove scanning artefacts. We analysed SCRs starting within a time window of 1–3 s after CS onset as base-to-peak amplitude differences. The resulting skin conductance amplitudes were log-transformed and averaged for each condition. Behavioural data were further analysed using Matlab 7.8 (MathWorks, Natick, MA, USA) and SPSS (IBM, Armonk, NY, USA).
We compared the fit of two alternative learning models to trial-by-trial expectancy ratings in order to validate a model for the subsequent fMRI analysis. An RW delta type learning rule, in which PEs drive learning, was compared with an RW/PH hybrid model, in which associability as a function of the reliability of prior predictions controls learning rates dynamically.
In the RW model, the PE (δt) is defined as the difference between the outcome on trial t (rt), i.e. shock delivery (rt = 1 for shock and rt = 0 for omission of a shock), and the expected outcome (Vt) on the same trial (δt = rt −Vt). The value (Vt) is updated in every trial according to
The constant learning rate κ as well as the initial value V0 were the free parameters of this model.
Whereas in the original PH model PEs do not directly drive learning, the basic assumption of learning by PEs as stated in the RW model is maintained in the RW/PH hybrid model (Le Pelley, 2004). However, unlike in the RW model, learning rates change dynamically in every trial depending on the reliability of prior predictions (i.e. the associability α). Formally, the hybrid model that we applied can be described as follows
Accordingly, the associability on trial t(αt) is a function of the associability on the preceding episode plus the absolute or unsigned PE of the previous trial and the parameter η determines the relative weight given to the two terms of the sum. Figure 1B shows the assumed updating of parameters in relation to the actual chronology of events. Besides the learning rate κ and the initial value V0, η was an additional free parameter in the hybrid model. Thus, the RW model is nested in the hybrid model by setting η to 0 and the behavioural fit of the two models can be compared using likelihood ratio tests taking the different number of free parameters into account (Lewandowsky & Farrell, 2011).
To fit the models to the data, maximum likelihood estimation was applied. For each model, we estimated a single parameter set θ for all participants by minimizing the deviance between the data (i.e. the expectancy rating E) and the model (i.e. the value V), which is based on the negative log-likelihood (ln L; Lewandowsky & Farrell, 2011) summed over all participants and all trials
where ϕ(x, μ, σ) represents the value of the normal distribution at x (x = En,t) with mean μ (μ = Vn,t) and SD σ (σ = 0.3; ϕ was discretized into 12 bins), N is the number of participants and T is the number of trials per participant. We estimated a single set of parameters for all participants because this has frequently been shown to be optimal for model-based fMRI data analysis as it prevents outlying individual parameter estimates (Glascher et al., 2010; Li et al., 2011). However, we also confirmed the results of this procedure by fitting the models on an individual subject level. Model fitting commenced with a grid search over the entire parameter space (from 0 to 1 in steps of 0.1 for all free parameters) to avoid local minima. The best-fitting parameter estimates obtained by this approach were then used as initial values for the final optimization procedure, which was based on the Simplex method (Nelder & Mead, 1965) as implemented in the Matlab function fminsearchcon. The initial associability α0 was set to 1 assuming that the uncertainty about outcome predictions is initially maximal for each cue.
The RW and the hybrid model were fitted to the data in several variations and the resulting deviances were then compared using likelihood ratio tests. First, we fitted both models across all subjects and trials, and obtained one single set of parameter estimates. As the speed and accuracy of learning probably relates to each cue's contingencies and changes in contingencies, we further sought to optimize model fit by fitting both models separately for each condition (resulting in one set of parameters for each contingency condition, i.e. each cue). Deviances of the condition-wise fitted hybrid model were also compared with the hybrid model that was fitted across conditions. We finally adopted the condition-wise fitted parameters of the hybrid model (fitted across all subjects) for the subsequent imaging analysis, as these provided the closest fit to the behavioural data (see 'Results' and Table 1B). Model fitting and comparison were additionally performed on an individual level by fitting each of the above-mentioned models to each subject's behavioural data. Moreover, all models were compared against a baseline model to assure that they outperform a model with random predictions. To estimate the deviance of the baseline model, we randomized model predictions (i.e. the values for V that are compared with the ratings E; see above). As the estimated deviance thus depends on the random selection of values for V, we repeated this procedure 10 000 times and used the average deviance to compare the baseline model against the learning models (see Tables 1 and 2).
Table 1. Model comparisons and best fitting parameters for models fitted across all subjects
***P <0.001. BL, baseline model. ‡Model implementation described in previous accounts (Le Pelley, 2004; Li et al., 2011). †Model implementation used in this study. Likelihood ratio tests were performed based on the deviances aggregated across subjects (840 trials per condition for 21 subjects). The fitting parameters of the hybrid model (B – fitted separately for each condition) were used to generate PE and associability time series for the fMRI analysis.
Statistical parametric mapping (SPM8, Wellcome Trust Centre for Neuroimaging, London, UK) was used for preprocessing and analysing the imaging data. The first four volumes of each session were discarded to account for T1 equilibrium effects. Functional images were realigned to the first remaining volume and co-registered to individual skull-stripped T1 images. Subsequently, the diffeomorphic image registration algorithm (DARTEL) toolbox was used to create a sample-specific structural template as well as individual flow fields, which were used in turn for spatial normalization of the functional images. Data were smoothed with a 4 mm full-width at half maximum isotropic Gaussian kernel and resampled to a voxel size of 1 × 1 × 1 mm³.
A random-effects general linear model analysis was conducted on the fMRI data with separate predictors for each cue [cue A: CS– (acquisition) and new CS50 (reversal); cue B: CS50 (acquisition) and new CS100 (reversal); cue C: CS100 (acquisition) and new CS– (reversal)] at two points in time (CS and potential US onset). Subject-specific regressors for associability and unsigned PEs were generated using the best fitting parameters from the hybrid model fitted for each condition (see Table 1B) and entered as parametric modulators for each cue separately. The associability modulated the CS onset event as this is the point in time when associability is used to influence the value update and when the reliability of prior predictions is likely to be considered for the upcoming expectancy rating (Fig. 1B). The unsigned PE as a surprise signal is generated when the outcome information is available and was therefore used to modulate the US onset regressor preceded by a dummy regressor coding for outcome identity (1, shock; 0, no-shock). In a complementary analysis, we replaced the unsigned PE by the signed PE time series.
Functional images from all four sessions were concatenated and four session-specific constants were further included in the model. Within-session high-pass filtering (128 s cutoff period) and correction for temporal autocorrelation based on a first-order autoregressive model were applied according to the actual session-specific structure. The final first-level model for each subject thus consisted of 22 regressors in total, including session constants, realignment parameters and button presses as effects of no interest. All events were modelled as delta functions and convolved with a haemodynamic response function. Contrast estimates were tested for group level significance using one-sample t-tests.
To correct for multiple comparisons, we used a family-wise error rate threshold of P <0.05, small volume corrected in predefined regions of interest. Corrections with respect to the amygdala were based on probabilistic maps of the entire structures (obtained from the Harvard–Oxford atlas and thresholded at 50%). No probabilistic map exists for the midbrain and therefore corrections in this region were performed using an anatomical mask that comprised the whole midbrain (Maldjian et al., 2003). Additionally, areas surviving correction at P <0.05 (family-wise error corrected) for the whole acquired brain volume are reported. For display purposes, all maps are thresholded at P <0.005, uncorrected with an extend threshold of k =15 voxels and projected onto the mean, contrast-enhanced DARTEL-normalized T1 image. All activations are reported using x, y, z coordinates in Montreal Neurological Institute space.
To assign observed activations in the amygdala to its subregions, the corresponding coronal slices were compared against schematic tables of an anatomical atlas (Mai et al., 2008). We further consulted cytoarchitectonically defined probabilistic maps (Amunts et al., 2005) that distinguish three amygdala subdivisions: the centromedial (central and medial nuclei), superficial (anterior amygdala area, ventral and posterior cortical nuclei) and basolateral (lateral, basolateral, basomedial and paralaminar nuclei) nuclear group. We refer to the centromedial and the superficial nuclear group as the corticomedial amygdala (CM). This classification of amygdala subregions into a corticomedial and a basolateral part is reasonable from a functional perspective (Maren & Quirk, 2004; LeDoux, 2007; Ehrlich et al., 2009; Pape & Pare, 2010) and in terms of comparability to the few existing fMRI studies that previously reported functional dissociations of amygdala subregions in humans on the basis of high-resolution fMRI (Davis et al., 2010; Gamer et al., 2010; Bach et al., 2011; Boll et al., 2011; Prevost et al., 2011).
The US expectancy ratings as well as the SCRs indicated that volunteers successfully learned the CS–US contingencies (see Figs 2A and B). A 2 × 3 repeated-measures anova with factors phase (acquisition and reversal) and condition (CS–, CS50 and CS100) revealed a significant phase-by-condition interaction for behavioural (F2,40 = 107.05, P <0.001) and autonomic (F2,36 = 9.06, P <0.01) measurements. During acquisition, the averaged expectancy ratings were significantly higher for CS50 as compared with CS– (t20=7.55, P <0.001) and the highest values were observed in the CS100 condition differing significantly from CS– and CS50 expectancy scores, respectively (CS100 > CS–: t20=21.39, P <0.001; CS100 > CS50: t20=6.55, P <0.001). In the reversal stage, US expectancies were successfully reversed in accordance with the new contingency affiliations (new CS100 > new CS–: t20=13.68, P <0.001; new CS100 > new CS50: t20=6.58, P <0.001; new CS50 > new CS–: t20=3.23, P <0.01).
Likewise, the SCRs were higher for CS100 and CS50 as compared with CS– during acquisition (CS100 > CS–: t18=2.84, P <0.05; CS50 > CS–: t18=4.04, P <0.001). The SCRs did not differ significantly from each other in the CS100 and CS50 condition (t18 < 1). In the reversal stage, greater SCR amplitudes to the new CS50 were observed as compared with the new CS– (t18=2.49, P <0.05), but no significant difference was found for a comparison of the new CS100 vs. the new CS– (t18 < 1). We suppose that this latter finding is related to a general habituation effect of SCRs over time (main effect phase: F1,18 = 6.35, P <0.05). Just as during acquisition, no significant difference was observed for a comparison of the new CS100 and the new CS50 condition in the reversal stage (t18 < 1).
Behavioural model fit
Table 1 summarizes the fit parameters and model deviances for the baseline, the RW and the hybrid model for all fitting procedures applied. As the results show, the RW and the hybrid model outperformed the random baseline model. Furthermore, the hybrid model provided a significantly more accurate explanation of behaviour than did the RW model if fitted across conditions (Table 1A), and if each of the conditions was fitted separately to the data (Table 1B). Comparing both fitting alternatives against each other showed that the condition-wise fitted hybrid model provided the best behavioural fit (Table 1C). The parameter η especially varied considerably across conditions as the associative change occurred more gradually in some conditions [e.g. cue B: CS50 (acquisition) and new CS100 (reversal)] than in others [e.g. cue C: CS100 (acquisition) and new CS- (reversal)]. Furthermore, we fitted all models individually to each subject's behavioural data and compared the corresponding deviances summed over all subjects. These results also showed that the hybrid model resulted in a better fit than the RW model and both models provided a superior behavioural fit as compared with the baseline model. Thus, the results described above could also be confirmed on an individual level (see Table 2 for corresponding deviances and results of the likelihood ratio tests). Finally, we adopted the condition-wise fitted parameters of the hybrid model fitted across subjects (Table 1B) for the subsequent imaging analysis. Figure 3 shows the corresponding fitted quantities averaged across subjects for each cue.
Note that, in our implementation of the hybrid model, the associability was updated prior to the value. In a previous study (Li et al., 2011), however, where SCRs were used for model fitting (SCR data were too noisy for model fitting in the present study), the value was updated prior to the associability. As a consequence, the resulting model predicts a somewhat slower learning of sudden contingency changes, which is probably better reflected in implicit measures of fear learning such as SCRs, whereas expectancy ratings require a model predicting faster adaptations such as in the implementation of the hybrid model that we used (see Table 1D for the behavioural model fit of both updating procedures for our data). Importantly, the different updating approaches mainly affect the value parameter, whereas the associability and PE time series (the quantities of interest in the fMRI analysis, see also Fig. 3) are basically the same in either case and also display similar characteristics as in the study of Li et al. (2011), although model fitting was based on different measures.
In a first step we investigated the neural representation of the unsigned PE as a measure of immediate surprise at the time of US onset. As shown in Fig. 3, this signal decreased rapidly for the CS– and the CS100 condition, when the outcome started matching the expectations and increased strongly at the beginning of the reversal stage, when outcomes were surprising again. For the partially reinforced cues, the unsigned PE fluctuated more strongly and was equally high for unexpected shocks and unexpected omissions of a shock. Activity in the amygdala correlated positively with this signal (Fig. 4A and Table 3A). Comparisons with the high-detail diagram of an anatomical atlas (Mai et al., 2008) strongly suggest that the observed amygdala activation was located bilaterally in the CM (Fig. 5A for a schematic representation of amygdala subregions). This notion is further supported by the application of probabilistic maps of amygdala subregions (Amunts et al., 2005) showing that peak voxels in both hemispheres were probably located within the CM, whereas their likelihood of being located in the basolateral nuclear group (BLA) was rather low (Table 4).
Table 3. Imaging results
°P <0.05; corrected for multiple comparisons across the whole acquired brain volume. *P <0.05; **P <0.01; corrected for multiple comparisons in anatomical regions of interest. H, hemisphere; L, left; R, right; VTA, ventral tegmental area.
PE and associability-related BOLD responses in regions of interest and regions surviving correction for the whole acquired brain volume (degrees of freedom = 20).
Likelihood (in%) for peak voxels to be located in CM or BLA based on probabilistic maps (Amunts et al., 2005).
A. Unsigned PE
The surprise signal of the unsigned PE also yielded a highly focal activation in the midbrain anatomically consistent with the substantia nigra (SN)/ventral tegmental area and activity in the anterior insula (Fig. 4A). We did not observe a significant correlation with blood oxygenation level-dependent (BOLD) responses in the amygdala for signed PEs in a complementary analysis (inspected at a threshold of P <0.05, family-wise error corrected).
In a second step, activity in a different amygdala subregion was found to be negatively correlated with the associability at the time of CS onset (Fig. 4B and Table 3B), whereas no positive correlation could be observed in the amygdala (even at a liberal threshold of P <0.01, uncorrected). As the negative associability indicates the reliability of prior predictions, the observed negative correlations suggest that activity in the amygdala increased whenever outcome predictions became more reliable and decreased when outcome predictions were poor. According to the anatomical atlas as well as the probabilistic maps (Table 4), the observed amygdala activation can be assigned to the BLA. However, it should be noted that, although the probabilistic maps and the anatomical atlas yielded the same amygdala subregions in the present study, the location of amygdala nuclei can differ between both methods. To further approve the functional dissociation of the CM and BLA, we directly compared the mean activity with unsigned PE and negative associability signals in those areas [associability beta values were inversed for the purpose of this analysis to indicate the strength (and not the direction) of the correlation]. More specifically, we extracted the betas for both signals from all voxels falling into the CM and BLA, respectively. The CM was approximated by a combination of the bilateral superficial and the centromedial amygdala masks and the BLA was defined by bilateral basolateral amygdala masks using the maximum probability maps to define regions of interest (Eickhoff et al., 2005). A 2 × 2 repeated-measures anova with factors region (CM, BLA) and signal (unsigned PE, negative associability) on the mean beta coefficients from individual subjects revealed a significant region-by-signal interaction (F1,20 = 12.39, P <0.01) indicating that the two subdivisions of the amygdala are differentially engaged in representing the unsigned PE and negative associability (Fig. 5B). In addition, subsequent t-tests showed that the unsigned PE correlated significantly more strongly with activity in the CM than in the BLA (t20 = 2.54, P <0.05), whereas the negative associability function revealed a larger correlation with BOLD responses in the BLA as compared with the CM (t20 = 2.76, P <0.05).
In this study, we used a Pavlovian reversal-learning task to investigate associative learning. Expectancy ratings and SCR amplitudes were higher for CS+ as compared with CS– conditions during acquisition and reversal, indicating that participants successfully learned the CS–US contingencies in both stages of the experiment. Expectancy values were used in turn for model fitting and model comparison, which confirmed the hypothesis that an RW/PH hybrid model provided a significantly more accurate explanation of behaviour than an RW learning rule in line with previous accounts (Le Pelley, 2004; Li et al., 2011). BOLD responses in the CM and ventral midbrain tracked the unsigned PE at the time of outcome, whereas activity in the BLA correlated negatively with associability at the time of CS onset.
Dopamine neurons in the ventral midbrain in monkeys have recently been shown to signal unexpected positive and negative events similar to unsigned PEs (Matsumoto & Hikosaka, 2009) in addition to their well-known role in the encoding of signed PEs (Schultz & Dickinson, 2000). Likewise, the amygdala has been shown to be sensitive to unexpected events irrespective of their valence (Belova et al., 2007; Metereau & Dreher, 2012) and to unpredictability itself (Herry et al., 2007). Also, unsigned PE signals have been reported during reward learning in the rodent amygdala (Roesch et al., 2010). Our findings are in line with these reports, demonstrating for the first time an unsigned PE signal during aversive learning in the human amygdala and in the ventral midbrain. The unsigned PE reported here represents a US processing signal that is large for unexpected shocks and unexpected omissions, and has equal characteristics for CS– and CS100 as it decreases when outcomes become more expected and increases again at the beginning of the reversal stage. Being derived from an RW/PH hybrid learning model, it reflects a signal of immediate surprise that guides attention to unexpected outcomes and thereby reinforces subsequent learning.
In particular, the central nucleus of the amygdala (CE; located within the CM) is widely known for its critical role in mediating attention and vigilance, and many lesion studies in rodents have shown that a circuitry including the CE is critical for surprise/attention-induced enhancement of learning (Holland & Gallagher, 1999; Davis & Whalen, 2001). In a typical experimental setting, rats are trained to a tone–light sequence. Omission of the tone increases attention to the light and accelerates subsequent learning of light–food associations. The surprise-induced enhancement of learning was, however, absent in rats with lesions of the CE (Holland & Gallagher, 1993). Equally, rats in which the communication between the CE and SN was disrupted showed no surprise-enhanced learning and CE–SN projections have been suggested to reflect PE information in appetitive conditioning (Lee et al., 2006, 2010). Intact CE functioning and CE–SN communication were, however, only critical at the time of surprise and not at the time when surprise-enhanced learning was tested (Holland & Gallagher, 2006; Lee et al., 2008). In the present study, PE-related BOLD responses in the amygdala occurred in its corticomedial subregion and SN/ventral tegmental area, and the computational approach of the study further indicates that these signals are important for the updating of expectations at a later point in time. Mirroring the results obtained in rodents, our findings therefore strongly suggest a crucial role of the CM and SN/ventral tegmental area in the signalling of surprise, which is related to a later enhancement of learning. Apart from the amygdala and the midbrain, the unsigned PE also correlated with activity in the anterior insula. This region has been shown before to be activated by salience rather than valence in associative learning (Seymour et al., 2005; Metereau & Dreher, 2012) in addition to its frequently highlighted role in signalling uncertainty and risk (Mohr et al., 2010).
In a previous fMRI study, Li et al. (2011) reported a positive correlation of amygdala activity and associability at the time of outcome. Given that the unsigned PE and associability are correlated, this result fits with the current finding of a representation of the unsigned PE in the CM at the time of outcome. Both results can be interpreted as reflecting surprise or attentional changes in response to unexpected shocks or omissions. Associability in the current study, however, was used to modulate the CS and not the US onset event. This approach is based on the theoretical description of associability as a property of the CS in the original PH and in the hybrid model (Pearce & Hall, 1980; Le Pelley, 2004). Furthermore, although the associability information is in principle available as soon as the PE occurs, it only affects the update of the value when the next CS is presented. Thus, associability in the current study reflects the reliability of prior outcome expectations at the point in time, when this information is used to update current outcome expectations, whenever a new CS is presented.
We observed a negative correlation between associability and brain activity in the BLA. We refer to this negative associability signal in this study as reflecting predictiveness. Predictiveness represents a CS processing signal, which is large when prior outcome predictions are reliable and small when outcome predictions are poor. It increases during acquisition in all conditions, decreases at the beginning of the reversal stage and increases again when the reversed CS–US contingencies are learned. The finding of a predictiveness signal in the BLA ties in with the amygdala's role in learning to predict aversive outcomes (Glascher & Buchel, 2005; Schiller et al., 2008). It further fits with the findings of a recent fMRI study reporting increased amygdala responses to predictive as compared with nonpredictive cues that received the same pairing with the US in a blocking paradigm (Eippert et al., 2012). In accordance with the current results, this study demonstrated that the predictive value of CSs determines amygdala responses during fear conditioning in humans. Moreover, studies in animals demonstrated that the BLA is particularly critical for normal performance in a variety of settings that require knowledge of current outcome values including reversal learning and second-order conditioning (Lindgren et al., 2003; Schoenbaum et al., 2003; Johnson et al., 2009).
Thus, our finding of a predictiveness signal in the BLA supports the view that the predictive value of CSs is critical for amygdala responses during fear conditioning. On the one hand, the BLA has been highlighted as a site of plasticity in associative learning that is relevant for learning and maintaining CS–US associations (Maren & Quirk, 2004; Reijmers et al., 2007; Ehrlich et al., 2009; Pape & Pare, 2010), and CS and US information is assumed to converge in this region (Barot et al., 2008). Thus, increasing predictiveness and concomitant increased BOLD responses in the BLA might reflect strengthening of the associative memory with regard to CS–US contingencies. This assumption would, however, require that associative learning also occurs in the CS– condition as the predictiveness signal shows equal characteristics for CS100 and CS–. On the other hand, some recent studies demonstrated that learning of CS–US associations increased over time, when subjects were contingency aware (Schiller et al., 2010; Raio et al., 2012). These findings reflect the observed time course of the predictiveness signal in the current study. Predictiveness might therefore also reflect contingency awareness, which is likely to increase with increasing reliability of outcome predictions.
To strengthen the finding of separate recruitment of the BLA and CM by predictiveness and surprise signals, we directly compared the mean activity in both regions. Unsigned PEs were found to correlate with signal changes in the CM but not BLA, whereas the opposite was true for predictiveness signals indicating a clear functional dissociation of both regions. With respect to interactions between the BLA and CM during the process of aversive learning in humans, we can only speculate as the current study does not allow the drawing of firm conclusions. However, as projections from the BLA to the CM are not reciprocated in the amygdala (Pape & Pare, 2010), we would assume that the surprise signals in the CM project onto cortical areas, which then project back to the BLA where predictiveness as a derivative of these signals controls learning of cue–outcome associations.
To summarize, we extended recent findings of PH-like learning signals in the amygdala (Li et al., 2011) by investigating CS- and US-related processing in an RW/PH hybrid model of reinforcement learning. By combining this approach with high-resolution fMRI, we demonstrate a unique functional dissociation of amygdala subregions during associative learning in humans. Our data show that activity in the CM as well as in the SN/ventral tegmental area correlated with surprise signals at the time of outcome. BOLD responses in the BLA, in contrast, represented predictiveness at the time of CS and presumably strengthening of the associative memory or increasing contingency awareness. Overall, our results converge with findings from other species and help to bridge the gap with animal neurophysiology.
This work was supported by a DFG grant (GRK 1247) and DFG SFB TRR 58. We thank Catherine Hindi Attar and Stephan Geuter for helpful discussions regarding this work.