Extinction of cue‐evoked food‐seeking recruits a GABAergic interneuron ensemble in the dorsal medial prefrontal cortex of mice

Animals must quickly adapt food‐seeking strategies to locate nutrient sources in dynamically changing environments. Learned associations between food and environmental cues that predict its availability promote food‐seeking behaviors. However, when such cues cease to predict food availability, animals undergo “extinction” learning, resulting in the inhibition of food‐seeking responses. Repeatedly activated sets of neurons, or “neuronal ensembles,” in the dorsal medial prefrontal cortex (dmPFC) are recruited following appetitive conditioning and undergo physiological adaptations thought to encode cue‐reward associations. However, little is known about how the recruitment and intrinsic excitability of such dmPFC ensembles are modulated by extinction learning. Here, we used in vivo 2‐Photon imaging in male Fos‐GFP mice that express green fluorescent protein (GFP) in recently behaviorally activated neurons to determine the recruitment of activated pyramidal and GABAergic interneuron dmPFC ensembles during extinction. During extinction, we revealed a persistent activation of a subset of interneurons which emerged from a wider population of interneurons activated during the initial extinction session. This activation pattern was not observed in pyramidal cells, and extinction learning did not modulate the excitability properties of activated pyramidal cells. Moreover, extinction learning reduced the likelihood of reactivation of pyramidal cells activated during the initial extinction session. Our findings illuminate novel neuronal activation patterns in the dmPFC underlying extinction of food‐seeking, and in particular, highlight an important role for interneuron ensembles in this inhibitory form of learning.


| INTRODUCTION
Animals need to efficiently adapt to changes in the predictive value of environmental "cues" which signal food availability to optimize energy use when foraging (MacArthur & Pianka,1966). This includes being able to learn how to inhibit food-seeking behaviors, after cues that previously predicted food availability cease to do so. This type of inhibitory learning is modeled in the laboratory using an extinction procedure, in which a previously established association between food (unconditioned stimulus, US) and a stimulus that reliably predicted its availability (conditioned stimulus, CS) is weakened by presenting the CS in absence of the US. Typically, following this procedure, the CS alone will no longer elicit food-related behaviors, such as approach behaviors toward the location where food was previously made available (Pavlov, 1927;van den Akker, Schyns, & Jansen,2018;Ziminski et al., 2017). Crucially, it is well-known that while the conditioned response ceases to be expressed, the original underlying CS-US association remains intact and is actively "inhibited" rather than unlearned. This idea is supported for instance from phenomena such as spontaneous recovery in which an "extinguished" CS-evoked response re-emerges as a result of the mere passage of time due to a failure to retrieve extinction memories (Bouton,1993;Pavlov,1927;Pearce & Hall,1980;Ziminski etal.,2017).
The dorsal region of the medial prefrontal cortex (dmPFC) is involved in the discrimination of food-predictive cues (Bussey, Everitt, & Robbins, 1997;Cardinal et al., 2002;Parkinson, Cardinal, & Everitt, 2000). Recent work has demonstrated that in this brain area, sparse sets of strongly and repeatedly activated neurons called "neuronal ensembles" mediate food-seeking (Whitaker et al., 2017). In our recent study using in vivo 2-Photon imaging, we found persistent activation of subsets of neurons, that is, stable neuronal ensembles, in the dmPFC during acquisition of an appetitive CS-US association (Brebner et al., 2020). In the prefrontal cortex, the coordinated actions of excitatory, glutamatergic pyramidal cells and inhibitory, GABAergic interneurons are thought to control food-seeking (Warren et al., 2016;Ziminski et al., 2017). Furthermore, there is evidence that interneurons in the dmPFC are involved in extinction learning (Courtin et al., 2014;Sparta et al., 2014). Since their activation can modulate the size of pyramidal cell ensembles, they are prime candidates in inhibiting the original CS-US memory trace (Morrison et al., 2016;Stefanelli, Bertollini, Lüscher, Muller, & Mendez, 2016). While prefrontal cortex GABAergic interneurons only represent ~10% of neurons that are strongly activated during food memory recall (Warren et al., 2016;Ziminski et al., 2017), they exert widespread influence over the activity of many neurons (Tremblay, Lee, & Rudy, 2016).
Changes in neurons' intrinsic excitability control their ability to fire, and thus critically modulate the fidelity of information transfer across neuronal networks (Daoudal & Debanne, 2003;Kourrich, Calu, & Bonci, 2015). In addition to the recruitment of a neuronal ensemble during the acquisition of a CS-US association in the dmPFC, we found that this acquisition dynamically modulated the excitability of behaviorally activated neurons in this area (Brebner et al., 2020). To date, little is known about the recruitment dynamics and excitability alterations of behaviorally activated neurons in the dmPFC as a function of extinction learning. We addressed this knowledge gap by using microprism-based 2-Photon imaging in Fos-GFP × GAD-tdTomato (FGGT) mice (Brebner et al., 2020). In these mice, the activated neurons are identified by the presence of Fos-GFP, a fusion protein of Fos and green fluorescent protein (GFP; Barth, Gerkin, & Dean, 2004;Brebner et al., 2020;Ziminski et al., 2017;Ziminski, Sieburg, Margetts-Smith, Crombag, & Koya, 2018) and the red fluorescent protein td-Tomato under the control of the GAD65 gene promoter. This allowed us to differentially observe both recently activated pyramidal cells (tdTomato−) and interneurons (tdTomato+) as well as longitudinally track ensemble formation in vivo during extinction learning in the dmPFC (Barth et al., 2004;Besser et al., 2015;Brebner et al., 2020). Additionally, we investigated the excitability of behaviorally activated neurons during the early and late stages of extinction learning.
We observed that a persistently activated subset of interneurons emerged from a wider population of interneurons activated during the initial extinction session. In contrast, in activated pyramidal cells, we observed no such extinction-related ensemble formation nor altered excitability properties. However, extinction learning attenuated the likelihood of reactivation of pyramidal cells activated on the initial extinction session.

| Animals
Heterozygous (het) male Fos-GFP (RRID: IMSR_ JAX:014135; Barth et al., 2004), and GAD-tdTomato mice (RRID:IMSR_EM:10422); were bred onto a C57BL/6 background. GAD-tdTomato mice express tdTomato in GAD65-expressing neurons which consist of a heterogenous interneuron population including ~40%, with ~25% calretinin and ~28% somatostatin-expressing interneurons. Het male GAD-tdTomato were bred with het Fos-GFP female mice to produce double transgenic Fos-GFP × GAD-tdTomato (FGGT) mice as previously described (Brebner et al., 2020). Preliminary studies showed that both Fos-GFP and FGGT mice reliably acquired CS-US associations in a Pavlovian conditioning task and exhibited extinction learning, as well as similar excitability profiles of pyramidal cells. As such, FGGT male mice were used for 2-Photon imaging experiments and Fos-GFP male mice were used for ex vivo electrophysiology experiments. All mice were housed under a 12-hr light/dark cycle (lights on at 7:00 a.m.) at the maintained temperature of 21 ± 1°C and 50 ± 5% relative humidity. For all experiments, mice were aged 7-13 weeks at the beginning of experimental procedures and were food-restricted (to 90% of baseline (free-feeding) body weight) 1 week prior to behavioral testing and until the completion of the behavioral experiments. Experiments were conducted in accordance with the UK 1986 Animal Scientific Procedures Act (ASPA) and received approval from the University of Sussex Animal Welfare and Ethical Review Board.

| Microprism implantation in FGGT mice
At ages 10-13 weeks, FGGT mice were implanted with a microprism in the dmPFC. Microprism constructs were built by assembling 2 circular glass windows (5 mm and 3 mm diameter; #1 thickness, cat. no: 64-0700 and 64-0720, Warner instruments) and a 1.5 mm coated microprism (Model no: MPCH-1.5, part no: 4531-0023, Tower Optics) using optical glue (Norland Optical Adhesive), such that the microprism rested on the 3 mm window with its vertical imaging edge on the diameter. Mice were anesthetized with isoflurane 3% dilution in O 2 (0.8 L/min) and NO 2 (0.5 L/min) and maintained with between 1% and 2% dilution throughout the surgery. To reduce cerebral inflammation, mice first received an injection of dexamethasone (Dexadreson,5 mg/kg,s.c. or i.m.). The skin on their scalp was removed and the skin around the section region was glued to the skull (Vetbond, 3M). The bone was then scored before a set of custom head-bars was fixed to the skull using dental cement (Unifast TRAD). A 3 mm circular opening was created in the skull centered at 0 mm on the medio-lateral axis and at bregma 0.8 mm on the rostral caudal axis (±0.2 mm according to the location of blood vessels). The final area observable through the microprism spanned approximately from bregma 0.05 to 1.55 mm on the rostro-caudal axis and from 0 to 1.5 mm on the dorso-ventral axis (of note, the most dorsal section was usually obscured by the central sinus). The vast majority of this area constitutes the anterior cingulate cortex of the mPFC (Figure 3a; Paxinos & Franklin, 2001). Microprism implantation occurred as previously described (Brebner et al., 2020;Low, Gu, & Tank, 2014). The dura was removed and the microprism construct was lowered into the brain using a custombuilt holder such that the microprism was positioned between the hemispheres with the imaging surface placed against the sagittal surface of one of the hemispheres (Figure 2a). The construct was glued with Vetbond and further fixed in place with dental cement (Unifast TRAD). Following implantation, mice received buprenorphine (0.1 µg/kg, i.m.) and were left to recover in a heated chamber for at least 1 hr. Following surgery, they received 3 days of oral meloxicam (0.2 ml/day of a 1.5 mg/ml meloxicam solution in wet mash; Metacam, Boehringer). All mice recovered for a minimum of two weeks before undergoing any further procedures and the first imaging session typically occurred 3-4 weeks following surgery, to allow inflammation in the imaging area to subside.

| General procedures
Similar behavioral experimental procedures and apparatus were utilized as in Brebner et al. (2020) and Ziminski et al. (2017). Briefly, behavioral experiments were performed in standard mouse conditioning chambers (15.9 × 14 × 12.7 cm; Med Associates), each fitted with a recessed magazine for dispensing boluses of ~15 µl of 10% sucrose solution that served as the unconditioned stimulus (US). A mechanical relay was used to generate sequences of intermittent clicks that served as the conditioned stimulus (CS). An infrared beam detected head entries into the sucrose delivery magazine. Mice were randomly assigned to "Paired" or "Unpaired" conditions and underwent identical procedures, except that in the unpaired condition, mice only received sucrose in their home cage at random times (1-4 hr) before or after each acquisition session. As such, this condition controlled for factors such as the effects of handling, chamber, CS and US exposure. One day following an initial "magazine training" (to familiarize mice with the procedures, and mice in the paired condition with the sucrose delivery magazine), mice underwent 12 acquisition sessions over a 7 day period during 1 or 2 sessions per day. Each 25 min acquisition session consisted of six 120 s CS (click) presentations, separated by 120 s random-interval (RI) inter-trial (ITI) periods. In the paired condition, during each CS period, 10% sucrose was delivered into the magazine on a RI30 schedule. The extinction phase commenced 3-4 days following the last acquisition session. Mice in both the paired and unpaired conditions underwent extinction sessions once per day for 7-9 days. Extinction sessions were identical to the acquisition sessions with the exception that no sucrose solution was available for mice in both the paired and unpaired conditions. Also, all mice did not receive sucrose in their home cages during the extinction phase.

In vivo imaging experiment
Acquisition sessions proceeded in mice in the paired and unpaired conditions as described in Section 2.3.1 with a protruding feeding port to accommodate mice equipped with a head-restraint device. For acquisition sessions 1 (S1), 5 (S5) and 11 (S11), mice in the unpaired condition received sucrose 10 min before training in their home cage, for all other sessions, sucrose was delivered at a random time during the day. Three to four days following the last acquisition session, mice underwent 7 extinction sessions (1 session/day).

Ex vivo electrophysiology experiment
Fos-GFP mice were randomly assigned to E1 or E7 groups. Following acquisition, mice in the E1 group received only a single extinction session, and those in the E7 group received 7-9 extinction sessions, before being sacrificed for electrophysiology recordings.

| Intrinsic excitability recordings
Pyramidal cells were held at −65 mV for the duration of recording. The current clamp protocol consisted of 800 ms positive current injections from −60 pA incrementing in 4 pA steps. The liquid junction potential was −13.7 mV and was not accounted for. Spike counts were conducted using Stimfit (Guzman, Schlögl, & Schmidt-Hieber, 2014) while spike kinetics were analyzed with MiniAnalysis software (MiniAnalysis, Synaptosoft). Spike threshold was measured using the third differential with Mini Analysis software. The action potential (AP) peak was calculated as the difference between the AP peak and AP threshold. Half-width was measured as the AP width at half-maximal spike following cubic spline interpolation to increase sampling rate by a factor of 4. Post-spike fAHPs and mAHPs were measured ~ 3 and | 3727 BREBNER Et al. 40 ms following the AP threshold, respectively, similar to (Ishikawa et al., 2009).

| Habituation and imaging sessions
2-Photon imaging experiments were conducted as described in (Brebner et al., 2020). Imaging sessions took place on headfixed, awake mice that were able to freely run on a polystyrene cylinder ( Figure 2c). For ~1 week prior to the first imaging session, mice were habituated to being restrained by being head-fixed regularly for increasing durations. Following habituation, the brain surface under the microprism was assessed and 2-3 areas of interest were defined. In each area of interest, z-stacks in both the red and green channels were recorded simultaneously at an excitation wavelength of 970 nm (power at the objective: 70-130 mW; pixel dwell time: ~3.9 ns) from the pial surface to a depth of approximately 300 µm. Each slice of the stack was an average of two 660.14 × 660.14 µm images (corresponding to 512 × 512 pixels; pixel size: 1.2695 × 1.2695 µm). Images were captured in pre-defined areas of interest using a Scientifica multiphoton microscope (Uckfield, UK) with a 16× water immersion objective (CFI LWD Plan Fluorite Physiology objective, NA 0.8, WD 3mm; Nikon Corporation) and a Chameleon Vision-S Ti:Sapphire laser with dispersion precompensation (Chameleon, Coherent). The software used for recording was ScanImage r3.8 (Pologruto, Sabatini, & Svoboda, 2003).
Imaging sessions took place 75 min following the start of the 1st, 3rd and 7th extinction session, as well as the 1st, 5th and 11th conditioning session ( Figure 2c). Another two imaging sessions took place directly from the home cage (2-3 days prior to conditioning and 2-3 days after the last extinction session). Imaging sessions typically lasted 40-60 min. Of note, GFP expression observed during imaging is unlikely to be caused by previous behavioral sessions as imaging took place exclusively following AM sessions, approximately 18 hr from the previous PM session where Fos-GFP expression returns to baseline levels (Brebner et al., 2020). Two mice (1 unpaired, 1 paired) were excluded due to poor imaging quality on one or several imaging sessions and 1 mouse (unpaired) was excluded due to abnormal GFP+ counts in one session (identified with Grubbs's test, α = .05).

| Image analysis
Image analysis methods were similar to those utilized in Brebner et al. (2020). Initial image processing took place in FIJI (ImageJ; Schindelin et al., 2012). tdTomato images within a stack were aligned to on x and y axes with MultiStackReg (Thevenaz, Ruttimann, & Unser, 1998). The resulting transformation was then applied to the GFP image stack. Stacks were aligned between sessions using the Landmark Correspondence plugin (Stephen Saalfeld, HHMI Janelia Farms, Ashburn, VA, USA). A volume within layers II/III common to all sessions was identified and selected. For both imaging and analysis, layers II/III were identified primarily according to distance from the surface of the tissue (approximately between 100 and 300 µm from surface) as well as according to visual increases in cell density (as compared to layer I). All images in the selected stacks were despeckled and an FFT bandpass filter (upper threshold 40 pixels, lower threshold 5 pixels) was applied. Local maxima (noise tolerance: 30 pixels) were identified and the signal within a disk around the maxima (12 pixel diameter (15.234 µm) for GFP "signal" and 16 pixels diameter (20.312 µm) for tdTomato signal) was compared to the surrounding "noise" (2.5390 µm thick band, 1.2695 µm away from the disk). A cell was considered GFP+ or tdTomato+ if the "signal" > "noise" + 2.3 SD (noise), for at least two consecutive slices in the stack (Figure 2b). Positive cells were recorded in an empty 3D matrix the size of the stack and later the x, y, z coordinates of each cell were extracted from the matrix using 3D object counter (Bolte & Cordelières, 2006).
A custom Matlab (2016a, MathWorks) script defined whether each cell was a putative "interneuron" or "pyramidal cell" according to whether tdTomato signal was detected in a cell for a majority of recorded sessions. Repeatedly activated neurons were then identified by sorting cells according to their expression in all recording sessions. For each session, a cell's x, y, z coordinates were compared to those obtained from previous sessions. If the x, y and z coordinates fell within a 20 pixel interval (25.390 µm) of existing coordinates, it was considered the same cell. If several existing coordinates fulfilled this condition, the cell was assigned to the closest set of coordinates on the x, y plane as defined by Euclidean distance. If no coordinates fulfilled this condition, the cell was considered newly activated. All variables relating to GFP+ quantification were normalized to the average number of GFP+ cells detected in home cage sessions which was considered our baseline activation level (Figure 2d). By doing so, this accounted for inter-individual differences in cell density, GFP expression, as well as any possible damage caused by microprism implantation to the tissue. A typical session yielded between 500 and 3,000 GFP+ neurons.

| Statistical analysis
In the main text, we report all main and interaction effects that are key to data interpretation. All data were analyzed using GraphPad Prism (RRID:SCR_002798; GraphPad Software) and SPSS (IBM SPSS Statistics, version 23.0 (2015): IBM Corp). Group data are presented as mean ± SEM.

| Behavioral data
Head entry responses were analyzed using 3-way mixed ANOVAs in SPSS. Of note, for some animals (5 paired, 4 unpaired), data recorded during conditioning were previously analyzed separately (Brebner et al., 2020).

| Imaging data
GFP+ counts were analyzed with 2-way mixed ANOVAs in Prism and 3-way mixed ANOVAs in SPSS. Following 2-way mixed ANOVAs, further post hoc tests were performed (Sidak correction) if an interaction was observed (p < .05). Log-linear analyses and chi-squared tests were performed on pooled neurons in SPSS and further post hoc procedures ( (Beasley & Schumacker, 1995); Bonferroni corrected) were performed for chi-squared tests if a significant interaction was observed (p < .05). Because interneurons and pyramidal cells are affected differently by glutamatergic signaling (Riebe et al., 2016), suggesting distinct Fos induction thresholds, these were analyzed separately. Of note, for some animals (4 paired and 2 unpaired mice), data recorded during conditioning was previously analyzed separately (Brebner et al., 2020).

| Intrinsic excitability data
Firing capacity and I/V curves of GFP+ and GFP− pyramidal neurons between E1 and E7+ were analyzed using a 3-way mixed ANOVA (factors of Condition, Cell Type and Current).
Active and passive membrane kinetics were analyzed using 2-way ANOVAs (factors of Condition, Cell Type).

| Extinction learning attenuates CSevoked responding
We first trained "Paired" FGGT mice (n = 6) to associate an auditory cue (CS) with sucrose solution delivery (US) across 12 conditioning sessions, in acquisition (Figure 1a). "Unpaired" FGGT mice (n = 6) in contrast received CS and US exposure non-contiguously and formed no appetitive association (Figure 1b). We observed a 3-way interaction between Acquisition Session × Group × Cue (F 11,110 = 2.98, p < .01). Thus, paired mice learned the CS-US association over the conditioning sessions. Following acquisition of a CS-US association, mice underwent 7 "Extinction" sessions in which mice in both paired and unpaired conditions received repeated cue-presentations in the absence of US delivery (Figure 1a). During extinction, we observed a significant interaction of Group × Cue (F 1,10 = 13.56, p = .004) and Session × Group (F 6,60 = 5.96, p < .001). Mice in the paired condition decreased their overall responses as extinction progressed (Figure 1c).

| Extinction learning recruits a neuronal ensemble from a wider pool of interneurons activated in early extinction
We used 2P-imaging in microprism-implanted FGGT mice to characterize neuronal activation patterns among pyramidal cells and interneurons in layers II/III of the dmPFC following acquisition and extinction sessions (Figure 2a-c). In order to assess baseline GFP expression, we first examined F I G U R E 1 Experimental timeline and performance during acquisition of a CS-US association and extinction learning. (a) Timeline of acquisition, extinction and imaging. (b) Head entries into the magazine during the CS (cue) compared to ITI (no cue) periods during acquisition and (c) extinction of CS-evoked responding in paired (P) and unpaired (UP) FGGT mice. All data are expressed as mean ± SEM; P n = 6, UP n = 6 [Colour figure can be viewed at wileyonlinelibrary.com] the number of GFP+ pyramidal cells and interneurons per mm 3 in mice that had been in the home cage (HC) for at least 24 hr. Imaging sessions were conducted both before (HC1) and after (HC2) mice underwent behavioral training. We observed no significant interaction effect of Group × Session for pyramidal cells (F 1,7 = 0.06, p = .812) nor main effects of Group (F 1,7 = 2.22, p = .180) and Session (F 1,7 = 5.50, p = .052). Similarly, in interneurons we observed no significant interaction effect of Group × Session (F 1,7 = 2.74, p = .142; Figure 2d), nor main effects of Group (F 1,7 = 1.55, p = .253) and Session (F 1,7 = 4.68, p = .067). Thus, behavioral training did not modulate baseline GFP expression in either cell type. In further analyses, to account for inter-individual differences in cellular density and imaging quality, the number of HC1 and HC2 GFP+ pyramidal cells and interneurons were averaged for each mouse and used to normalize any subsequent GFP+ cell counts.
Repeated activation is thought to consolidate neurons into a stable ensemble that mediates learned associations (Brebner et al., 2020;Mattson et al., 2008). Thus, we investigated whether such a neuronal ensemble was recruited during extinction from a pool of candidate neurons activated in E1. To this end, we assessed the number of GFP+ neurons in two distinct "Activation History" categories; those that were recruited from the first extinction session and repeatedly activated in latter sessions (E1+| E3+ E7+), and those recruited in the two latter sessions, but not the first session (E1−| E3+ F I G U R E 2 Experimental Timeline, Methods of 2-Photon imaging and baseline GFP expression. GFP expression was longitudinally monitored in pyramidal cells and interneurons. (a) Microprism placement for dmPFC imaging. (b) Representative in vivo 2-Photon image of dmPFC from Fos-GFP × GAD-tdTomato (FGGT) mice (green arrow: GFP; gray arrow: tdTomato; blue arrow: GFP+ tdTomato). GFP+ neurons were selected by comparing signal intensity to surrounding background. (c) Imaging timeline and schematic representation of imaging session in head-fixed mice following behavioral training under freely moving conditions (S1, S5, S11; and E1, E3, E7) or from home cage (HC1, HC2). (d) Number of GFP+ pyramidal cells (green) and interneurons (red) per mm 3 in imaging sessions taking place directly from home cage both before (HC1) and after (HC2) behavioral training. Data are expressed as mean ± SEM. Paired (P) n = 5, unpaired (UP) n = 4 [Colour figure can be viewed at wileyonlinelibrary.com] E7+; Figure 3c). We observed a significant interaction of Activation History × Group in interneurons (F 2,14 = 6.59, p < .05), but not pyramidal cells (F 2,14 = 0.43, p = .535). Post hoc testing in interneurons revealed a significant increase in the number of stable, E1+| E3+ E7+ interneurons in paired mice, compared to unpaired mice (p = 37.7 ± 2.2; UP = 24.5 ± 3.3; p < .01). Thus, extinction recruited a stable interneuron ensemble, from a larger pool of interneurons activated during the first extinction session.

| Extinction learning alters activation patterns of pyramidal cells and interneurons
Our findings demonstrate that extinction learning leads to the recruitment of an interneuron ensemble recruited from neurons activated in E1. However, we observed no differences in the number of activated pyramidal cells in mice in the paired compared to unpaired condition. Interneurons have been shown to control excitatory ensembles during learning (Morrison et al., 2016;Stefanelli et al., 2016). Thus, having established that the repeatedly activated interneuron ensemble is recruited from E1, we next examined at a population level how extinction learning altered the likelihood of neuronal reactivation following activation in E1. Among E1-activated neurons, we assessed the proportion of neurons that were reactivated in E3 and E7 (E1+| E3+ E7+), not reactivated in E3 and E7 (E1+| E3− E7−), as well as neurons activated in E1 and E3, but not E7 (E1+| E3+ E7−) and activated in E1 and E7, but not E3 (E1+| E3− E7+; Figure 3d). There was a significant interaction of Activation History × Group for pyramidal cells (X 2 3 = 52.837, p < .001) but not interneurons (X 2 3 = 3.12, p = .375). Notably, there was a significantly lower proportion of pyramidal cells activated in E1 that were activated across the subsequent extinction sessions F I G U R E 3 Extinction learning recruits a stable interneuron ensemble from the initial extinction session. (E1+| E3+ E7+) in the paired, compared to unpaired condition. Furthermore, there was a higher proportion of pyramidal cells activated during initial memory recall (E1) that were not reactivated during later extinction sessions (E1+| E3− E7−; p < .05). Thus, extinction learning significantly reduced the likelihood of pyramidal cells activated in E1 to become persistently reactivated as extinction learning progressed. These findings suggest that although extinction learning enhances the number of interneurons activated in E1 to become persistently reactivated, this learning is not associated with the altered likelihood of persistent reactivation of E1-activated neurons.
Extinction learning has been shown to recruit distinct neuronal ensembles compared to those recruited in appetitive conditioning (Warren et al., 2016). Next, we examined whether extinction learning differentially modulated the recruitment of the ensemble that formed during the acquisition of the CS-US pairing. These excitatory ensembles are defined as having been persistently activated (GFP+) during the early, middle and late phases of acquisition namely on the 1st (S1), 5th (S5), and 11th (S11) acquisition sessions (Brebner et al., 2020). At a population level, we compared the proportion of persistently activated neurons during extinction (E1+| E3+ E7+) with the proportion of neurons also persistently activated during acquisition (i.e., activated in early, middle and late conditioning; Figure 3e; Brebner et al., 2020). A chisquared analysis of Group × Acquisition History revealed that the proportion of persistently activated extinction neurons that had been persistently activated during acquisition decreased in the paired condition for both pyramidal cells (paired: 646 of 1,115, unpaired: 1,122 of 1,565; X 2 1 = 11.43, p < .01) and interneurons (paired: 69 of 115, unpaired: 48 of 146; X 2 1 = 29.12, p < .001). Thus, during extinction the likelihood that persistently activated pyramidal cells and interneurons had been previously persistently activated during acquisition was reduced.

| Extinction learning does not modulate the intrinsic excitability properties of behaviorally activated pyramidal cells
It is possible that our observed alterations in pyramidal cell activation patterns at a population level may be accompanied by intrinsic excitability alterations within activated neurons. Moreover, previously (Ziminski et al., 2018), we have observed that Fos-expressing neurons may exhibit modulations in excitability independently of changes in the overall number of Fos-expressing neurons following reward cue exposure. As such, we determined whether extinction learning modulated the excitability of activated pyramidal cells in Fos-GFP mice (Figure 4a). We analyzed the frequency of action potentials elicited by positive current injection in GFP+ and GFP− neurons following the initial extinction session (E1) or following the establishment of extinction learning (E7). We observed no difference in the frequency of action potentials as a function of group or cell type across current injections (Figure 4b; Condition × Current × Cell Type; F 17,1,003 = 0.20, p = 1.00). We also observed no alterations in the input resistance, as measured by changes in the Current/Voltage (I/V) curves (Condition × Current × Cell Type; F 25,1,475 = 0.21, p = 1.00). Furthermore, we observed no 2-way interactions (Condition × Cell Type) in any measured passive or active property ( Figure 4c, Table 1), although we observed a main effect of Cell Type in the AP peak (F 1,58 = 4.46, p < .05) and half-width (F 1,58 = 4.45, p < .05). Overall, our findings indicate that extinction learning did not modulate the intrinsic excitability of activated dmPFC pyramidal cells.

| DISCUSSION
The dmPFC is widely regarded to have a role in driving the expression of learned behaviors (Gourley & Taylor, 2016;Moorman, James, McGlinchey, & Aston-Jones, 2015;Peters, Kalivas, & Quirk, 2009). To date, few studies have specifically examined the role of both pyramidal cells and interneurons in the dmPFC in extinction learning. Here, we extend the findings from our recent study in Brebner et al. (2020), by additionally demonstrating that the extinction of CS-evoked food seeking is associated with the recruitment of a stable ensemble of interneurons in this area. This ensemble was recruited from a wider pool of neurons activated early in extinction and was then repeatedly reactivated during subsequent extinction sessions, suggesting its role in weakening the expression of an appetitive conditioned response. This same pattern of activation did not generalize to pyramidal cells as these neurons exhibited a decreased likelihood of persistent reactivation following E1 activation. Additionally, we found no evidence that extinction learning modulated the intrinsic excitability in activated pyramidal cells. Finally, we found that activation patterns of both interneurons and pyramidal cells are modified in extinction learning, such that the overall recruitment of the ensemble formed during acquisition of a CS-US association was reduced. Taken together, our results provide new insights into the neuronal ensemble mechanisms underlying inhibitory conditioning during extinction of appetitive behaviors in the dmPFC.

| Extinction learning recruits an interneuron ensemble
Decades of animal learning research demonstrate that extinction learning typically fails to destroy the original associative memory trace. Under many circumstances, such as after reexposure to a US (reinstatement) or passage of time (spontaneous recovery), the original CS-US association survives (Bouton, 1993;Pavlov, 1927;Pearce & Hall, 1980;Peters et al., 2009;Ziminski et al., 2017). From this perspective, extinction results not from an erasure of the CS-US memory, but rather from an accumulation of extinction learning and the formation of a competing "CS-No US" memory representation (Bouton, 1993(Bouton, , 2007. Therefore, the efficacy of extinction learning to suppress behavior (e.g., cue-evoked food-seeking) relies on the probability of activating and retrieving the newly acquired inhibitory extinction versus the original excitatory CS-US meaning.
Interneuron activation has been associated with multiple forms of associative learning, including appetitive and aversive conditioning (Courtin et al., 2014;Doron & Rosenblum, 2010;Pinto & Dan, 2015;Stefanelli et al., 2016). Furthermore, we and others have shown that such forms of learning recruit persistently activated populations of neurons called neuronal ensembles (Brebner et al., 2020;Cao et al., 2015;Czajkowski et al., 2014;Mattson et al., 2008;Tayler, Tanaka, Reijmers, & Wiltgen, 2013). Our results provide an important new step by demonstrating that (a) extinction learning recruits a stable interneuron ensemble in the dmPFC, and (b) this ensemble was preferentially recruited from interneurons activated during the initial phase of extinction (i.e., extinction session 1), when behavioral responding is generally elevated. Given the aforementioned view of extinction as a form of associative learning whereby a new "CS-No US" association is established to compete with and inhibit earlier learned performance (Bouton, 2004;Rescorla, 1993), we propose that the development of an interneuron ensemble in the dmPFC during extinction may be involved in encoding this new inhibitory association. Furthermore, we observed more increased activation of interneurons throughout extinction sessions, in line with a number of studies by others suggesting a role for inhibitory signaling during extinction learning (Courtin et al., 2014;Sparta et al., 2014;Zou et al., 2016). Such increases were not detected among dmPFC pyramidal cells.
Although our method allowed us to differentiate between pyramidal cells and interneurons, it did not differentiate between the many interneuron subtypes. The GAD-tdTomato mouse, which expresses tdTomato in GAD65-expressing neurons, labels a heterogenous cortical interneuron population. As such, we cannot determine precisely which interneuron subtypes may have been part of the interneuron ensemble important for extinction learning. In the cortex, neurons expressing Parvalbumin (PV+), Somatostatin (SOM+) and the Vaso-intestinal protein (VIP+) represent the majority of GABAergic interneurons (Rudy, Fishell, Lee, & Hjerling-Leffler, 2011;Xu, Roby, & Callaway, 2010). In the mPFC, these neuronal subtypes have been implicated in distinct aspects of reward-related learning (Gaykema et al., 2014;Kim et al., 2016;Kvitsiani et al., 2013;Pinto & Dan, 2015). Of relevance to our study, generalized activation of PV+ interneurons of the dmPFC accelerates extinction of reward-seeking (Sparta et al., 2014). Furthermore, the inhibition of dmPFC PV+ interneurons following extinction has been shown to promote the expression of conditioned fear (Courtin et al., 2014). Together, these findings suggest that PV+ interneuron activity in the dmPFC has a role in the encoding and/or expression of inhibitory forms of learning. Thus, we predict that PV+ interneurons are a key component of the observed persistently reactivated inhibitory ensemble. Crucially, different interneuron subtypes may show similar or divergent activation patterns in extinction. Thus, further work will be necessary to delineate the exact composition of the interneuron ensemble we observed.
Of note, in this study we used an immediate-early gene approach to label recently activated ensembles. This is an effective method to label behaviorally relevant neurons that show strong activity accumulated across an entire test session (Cruz, Javier Rubio, & Hope, 2015). One limitation of this model is that GFP expression may be due to events prior to the behavioral session. However, while alternative methods (e.g., genetically encoded calcium indicators) have the benefit of increased temporal resolution, they are not suitable for labeling the robustly activated neurons for further characterization. As such, our choice in methods aimed to both identify behaviorally relevant neurons across multiple timepoints of extinction learning and characterize their electrophysiological properties.

| The effects of increased inhibitory drive on pyramidal cells during extinction
In the ventral mPFC, extinction learning has been shown to activate a distinct ensemble compared to that which was activated during initial acquisition learning (Warren et al., 2016). Here, we extend these findings by demonstrating that activation of pyramidal cells and interneurons at a population level is also altered in the dmPFC, such that the activation of the previously established acquisition ensemble becomes less prominent as a function of extinction experience. More specifically, among interneurons persistently activated during extinction training, the proportion of those neurons with a previous history of persistent activation in acquisition is decreased. Thus, we suggest that extinction learning recruits a stable interneuron ensemble from a wider pool of neurons activated in the initial extinction session and that this ensemble is partly distinct from those interneurons that were persistently activated in conditioning (Brebner et al., 2020).
On a population level, we observed that pyramidal cells that are persistently activated in extinction are less likely to have been persistently activated during acquisition. Moreover, extinction learning was associated with an increased likelihood of pyramidal cells activated in the initial extinction session, when behavioral responding is relatively high, to not be reactivated in subsequent sessions. Thus, these findings suggest a shift in the activation patterns of pyramidal cells as mice transitioned from acquisition to extinction, such that some of the pyramidal cells persistently activated during acquisition or activated in early extinction were no longer persistently activated throughout extinction. Recent studies have demonstrated that both PV+ and SOM+ interneurons have a role in controlling ensemble size in the amygdala and hippocampus, respectively (Morrison et al., 2016;Stefanelli et al., 2016). Therefore, in the dmPFC an extinction-related interneuron ensemble may come to regulate these pyramidal cell activation patterns. However, further work will be necessary to determine the relationship between the stable interneuron ensemble and the activation of excitatory pyramidal neurons during extinction.
In contrast to recruitment, we observed no modulation of intrinsic excitability in activated pyramidal cells across extinction. We have previously observed that the excitability of activated pyramidal cells is increased during the initial, but not later training phase of CS-US acquisition (Brebner et al., 2020). In this regard, it is interesting that such excitability alterations were not observed following the initial acquisition of a new CS-No US association. This suggests that excitability alterations in dmPFC pyramidal cells may be less important for updating the changes in the contingency of CS-US associations from an excitatory to an inhibitory association. Instead, these alterations may serve to mediate processes early in excitatory conditioning that facilitate the establishment of CS-US associations (e.g., increased exploration of learning environment and attentional resources Brebner et al., 2020;Bryden, Johnson, Tobia, Kashtelyan, & Roesch, 2011).
We also did not observe any generalized regulation of pyramidal cell excitability following extinction, in contrast to previous studies that examined these properties in the prelimbic cortex of the mPFC (Hayton, Olmstead, & Dumont, 2011;Song, Ehlers, & Moyer, 2015). This difference may be due to our recording site primarily encapsulating the anterior cingulate cortex and excitability alterations of anterior cingulate pyramidal cells may be less sensitive to extinction learning. In this study, we did not examine the excitability properties of activated interneurons. Hence, it remains to be seen whether such properties are dynamically regulated in interneurons during extinction learning, and how these properties impact the surrounding neurons. Such characterizations are challenging due to the relatively low portion of Fos-expressing interneurons compared to pyramidal cells, since only ~10% of cue-activated Fosexpressing neurons in the PFC are interneurons (Warren et al., 2016;Ziminski et al., 2017).

| Potential roles of increased inhibitory drive in extinction
Much evidence indicates that CS-US memories are inhibited, rather than forgotten (Bouton, 1993;Pavlov, 1927;Pearce & Hall, 1980;Peters et al., 2009;Ziminski et al., 2017). In line with this view, we found a reduction in the proportion of persistently activated pyramidal cells in Extinction with a persistent activation history during acquisition. Also, we found an increase in the proportion of activated pyramidal cells that were activated on the initial extinction session, but were no longer persistently activated throughout extinction. We hypothesize that these activation patterns occurred as a result of local inhibition from interneurons. Thus, in the dmPFC, the inhibitory ensemble that emerges may be involved in inhibiting the previously established CS-US memory trace.
This suppression could occur through actions on downstream targets as altering pyramidal cell activation may result in pathways that were previously activated by the dmPFC not being re-activated following extinction experience. For example, this could affect outputs that may have had a role in promoting behavioral vigor in conditioning, such as the projections to the nucleus accumbens (Otis et al., 2017;Parkinson et al., 2000). The dmPFC also has a role in attention (Bryden et al., 2011;Totah, Kim, Homayoun, & Moghaddam, 2009) and cue-response selectivity (Cardinal et al., 2002;Parkinson et al., 2000). In inhibiting and altering outputs to attentional processes, attentional and discriminatory networks that may have been consolidated during conditioning would have been prevented from becoming re-engaged during extinction. Thus, decreased activation of previously strengthened dmPFC networks targeted in conditioning may be crucial in promoting efficient extinction learning and subsequently adapting behavioral responses to changing predictors of food availability.

| CONCLUSION
Extinction learning is necessary for adapting to a dynamically changing environment when the cue no longer predicts food availability. Little attention has been paid to the role of dmPFC ensembles, especially with regards to the role of interneurons, in controlling extinction learning following the establishment of an appetitive CS-US association. Here, we revealed heightened and persistent interneuron activation in the dmPFC throughout extinction learning. Furthermore, we demonstrate that extinction learning recruits a stable interneuron ensemble in the dmPFC that emerges from a wider interneuron population recruited in the first extinction session. Such patterns did not generalize to activated pyramidal cells, and their excitability properties were not modulated during extinction learning. These recruitment patterns and physiological alterations are different to those observed in excitatory appetitive conditioning (Brebner et al., 2020), suggesting different neuronal mechanisms are involved in the acquisition/maintenance and extinction of behaviors controlled by food-cue associations.