Colloquially, we all know what it means to perform an action by ‘habit’. But scientifically, characterizing behaviors as such is not so straightforward. Indeed, while addictive behaviors are often framed in the context of habit learning (Graybiel, 2008), this remains a controversial issue in the neurobiology of addiction literature (Berridge, 2007; Robbins & Everitt, 2007). For example, does one light up a cigarette as an instinctive response to cues that are generally associated with that motor program (due to prior stimulus-response reinforcement learning)? Or rather, do such behaviors reflect a goal-directed process in which the craving of a particular desired outcome (i.e. the nicotine high) is used to flexibly determine any complex sequence of actions needed to achieve it? Both of these processes likely contribute to habit-like behavior, but teasing them apart (at both cognitive and neurobiological levels) is critical.
In animal research, habits are typically operationalized as behaviors that are so ingrained as to be insensitive to their consequent outcomes, as evaluated with a devaluation procedure. Animals are first trained to press a lever to produce a reward. Next, the reward itself is devalued so that it is no longer desirable to them (e.g. by providing free access to the reward so that they are satiated). The animal is then presented with the lever to determine whether they respond in a ‘goal-directed’ manner by reducing lever presses, indicative that their actions are sensitive to the (no longer desired) outcome. However, given sufficient prior training, animals will continue to press the lever ‘by habit’ despite the fact that they do not want the reward. In rats, it is now well established that the dorsolateral striatum is responsible for ingraining stimulus-response associations that lead to habit-like behavior, whereas prefrontal cortical regions are required for updating action-value representations and allowing the animal to exert goal-directed behavior (Yin et al., 2004).
By contrast, despite years of research on procedural and reinforcement learning, evidence for veridical habit learning in humans is lacking. In this issue of EJN, Tricomi et al. (2009) investigated this question with functional magnetic resonance imaging (fMRI) and behavioral testing over multiple sessions. They used a variable interval reinforcement schedule known to promote habitual behavior in rodents. Participants were presented with fractal stimulus cues and learned to press one of two different keys as often as they wished. Each cue, if responded to with the correct key and at the appropriate time, would yield a different food reward (corn chips or chocolate bars). After the training session, the devaluation procedure was administered: participants were given free access to one of the two rewards so that they were satiated to that food. In the critical test phase, when presented with the fractal cues, those who had undergone only a single day of training behaved in a goal-directed manner: they responded selectively less to the stimulus associated with the devalued reward. In stark contrast, those who had undergone three days of training continued to respond ‘by habit’ to both cues as if the devaluation procedure had never occurred.
What are the brain mechanisms of such habitual responding? Notably, in the imaging analysis, an area within the dorsolateral striatum – specifically the right posterior putamen and the globus pallidus to which it projects – showed greater activation at the end of the three day training procedure than at the beginning. These findings are remarkably consistent with those implicating homologous areas in rodent habitual behavior (Yin et al., 2004). Furthermore, activations in the ventromedial prefrontal cortex (vmPFC), which were previously shown to be sensitive to changes in subjective value following devaluation (Valentin et al., 2007) appeared to ‘ramp up’ on each trial in anticipation of reward outcomes – consistent with an action-outcome goal-directed process.
How do brain areas involved in habitual versus goal-directed behavior interact? Whereas basal ganglia activations increased across training, suggestive of a progressive reinforcement learning process, the putative action-outcome representations in vmPFC continued to ramp up in anticipation of reward throughout all training sessions. Thus habitual responding appears not to result from a decrease in the anticipation of reward outcomes across trials, but rather due to strengthening of stimulus-response links. This interpretation is consistent with that embedded within computational models in which habitual responding emerges even as both striatal and prefrontal systems represent their respective estimates of reward likelihood for the action at hand, but with striatum progressively exerting a dominant bias on behavioral control (Daw et al., 2005; Frank & Claus, 2006).
In summary, these data provide compelling evidence for common neurobiological substrates of habit learning in humans and rodents. Future studies should investigate the degree to which these processes are altered by clinical disorders including obsessive compulsive disorder and addiction.