Consistent behavioral phenotype differences between inbred mouse strains in the IntelliCage

Authors


PD Dr S. Krackow, NewBehavior AG, Technoparkstrasse 1, CH-8005 Zürich, Switzerland. E-mail: sven.krackow@newbehavior.com

Abstract

The between-laboratory effects on behavioral phenotypes and spatial learning performance of three strains of laboratory mice known for divergent behavioral phenotypes were evaluated in a fully balanced and synchronized study using a completely automated behavioral phenotyping device (IntelliCage). Activity pattern and spatial conditioning performance differed consistently between strains, i.e. exhibited no interaction with the between-laboratory factor, whereas the gross laboratory effect showed up significantly in the majority of measures. It is argued that overall differences between laboratories may not realistically be preventable, as subtle differences in animal housing and treatment will not be controllable, in practice. However, consistency of strain (or treatment) effects appears to be far more important in behavioral and brain sciences than the absolute overall level of such measures. In this respect, basic behavioral and learning measures proved to be highly consistent in the IntelliCage, therefore providing a valid basis for meaningful research hypothesis testing. Also, potential heterogeneity of behavioral status because of environmental and social enrichment has no detectable negative effect on the consistency of strain effects. We suggest that the absence of human interference during behavioral testing is the most prominent advantage of the IntelliCage and suspect that this is likely responsible for the between-laboratory consistency of findings, although we are aware that this ultimately needs direct testing.

Significant between-laboratory variation in measures of behavioral phenotypes is notorious in mice (Wahlsten et al. 2003). In spite of serious attempts to overcome differences in animal handling and environmental impacts between laboratories (Crabbe et al. 1999), this appears not feasible given the subtle nature of effects that could potentially affect animal behavior. To mention a few, unavoidable background noise and its daily pattern in modern concrete buildings, diversity of breeding devices and maintenance procedures, idiosyncrasies of animal breeding staff behavior as well as other variables, such as olfactory cues, cannot reasonably be controlled for, but are most probably of impact, particularly on activity patterns. And overall activity level will affect most quantitative behavioral traits, including cognitive measures based on response rates.

On the other hand, overall differences in quantitative measures may actually not be of any decisive importance. Rather, in nearly all conceivable contexts, it is the consistency of effects, be it of treatments or strain identities, that is crucial to allow for valid research predictions and, thereby, meaningful hypothesis testing. Unfortunately, even strain effects have been found to vary significantly between laboratories (Crabbe et al. 1999). As a major cause for between-laboratory effects, human interactions with animals have been suggested to interfere with behavioral performances in unpredictable directions (Chesler et al. 2002). Also, it has been argued that conventional standardization experiments boost any laboratory-specific deviations as an effect of extreme homogeneity of animal status (Richter et al. 2009). Hence, we aimed at testing whether phenotyping in the IntelliCage would be consistent between laboratories, as human interaction is excluded in the IntelliCage where trait assessment and animal treatment is fully computerized and automated.

In the current study, we compared behavioral and cognitive laboratory strain phenotypes in the IntelliCage with trait differences known from ‘traditional’ experimental procedures, and secondly, we evaluated the significance of laboratory identity for strain effects on behavioral and cognitive traits, in the IntelliCage. To achieve this, three strains of mice that have repeatedly been used in between-lab standardization studies (Richter et al. 2009) were tested in mixed-strain groups for differences in activity and cognitive traits, namely, spatial preference and spatial reversal learning.

Material and methods

To maximize standardization of procedures, four research groups in different European institutions located in different European countries were simultaneously provided with same-sex, same-age mice from the same supplier, and simultaneously ran identical experimental protocols under laboratory conditions according to common experimental standards using a fully balanced design. Furthermore, experimental data were uploaded onto a common databank (FBI.lab-animal.net) and analyzed by a non-participating collaborator (S.K.), to assure equal data manipulation and analysis procedures for all experiments.

Study design

For the experiments, four mice of each of three strains were placed into each of two IntelliCages in each of four laboratories (EVOTEC: Evotec Neurosciences, Hamburg, Germany; ISS: Section of Behavioural Neurosciences, Dipartimento di Biologia cellulare e Neuroscienze, Istituto Superiore di Sanità, Roma, Italy; NKAR: Karolinska Institutet, Alzheimer's Disease Research Center, Stockholm, Sweden; UNIZH: Institute of Anatomy, University of Zürich, Switzerland), at three consecutive trial periods, starting 1 October, 30 October and 26 November 2007. The laboratories ran identical experimental protocols according to the schedule given in Table 1.

Table 1.  Experimental procedure as explained in the Material and methods (Experimental setup and protocols)
ProtocolTrial daysParadigm
Free Adaptation7Doors open
Nosepoke Adaptation3Nosepoke needed for access to water
Drinking Session Adaptation4Water access restricted to 2 h/night
Place Learning4Water available in one corner per animal
Reversal Learning3Water access shifted to another corner

The study design therefore exhibited maximum synchronization between laboratories, identical experimental procedures, and was fully balanced with respect to contributions of laboratories, IntelliCages and strains to data variance.

Experimental setup and protocols

Experiments were run in IntelliCages, which consist of a polycarbonate cage (20.5 cm high × 58 × 40 cm at top, 55 × 37.5 cm at bottom) containing a conditioning chamber in each corner (Fig. 1). Each chamber allows access to two water bottles for drinking at left and right sides, by means of a closable round opening. Pokes at these openings are registered by a light-beam sensor and termed ‘Nosepokes’. Mice can enter the corners through a ring containing a transponder reader antenna, and their presence is confirmed by a temperature-differential sensor. Antenna readings and presence signals determine what is termed a ‘Visit’. The apparatus is controlled by user-designable IntelliCage software 2.1 that allows for door openings in response to animal behavior as registered by the sensors. For instance, doors might be closed when no mouse is present in a corner, but opened according to whether a predefined combination of Visit and/or Nosepoke pattern and transponder identity occurs.

Figure 1.

View of conditioning corner exhibiting presence sensor, light-beam sensors at the opening giving access to water, LED lights that could provide conditional stimulus and air puff valve that could be used for punishment. Animals could enter corners through an antenna ring detecting the transponder number, and access to water is controlled by a door mechanism responding to animal behavior and day pattern settings.

The following protocols were applied by means of appropriately designed control modules for the IntelliCages.

  • 1Free Adaptation: Mice had free access to water in all corners, as all doors were opened.
  • 2Nosepoke Adaptation: Mice had to perform a Nosepoke which opened the respective door, in order to drink. Opened doors were closed automatically after 7 seconds. Only one door opening per Visit was allowed, i.e. mice had to visit again (any corner) to get access to more water.
  • 3Drinking Session Adaptation: In addition to the Nosepoke Adaptation protocol, access to water was restricted to two 1-h drinking sessions per night. The first drinking session began 3 h after dark phase onset, the second 4 h after the end of the first session. Except for drinking sessions all doors remained closed.
  • 4Place Learning: Same as 3, but with access to the water restricted to one corner. Drinking corner assignment was randomized within and balanced between strains, i.e. each corner had assigned one mouse of each strain.
  • 5Reversal Learning: The water-bearing corner was replaced by the corner opposite to the one in 4.

Animals and husbandry

The 4 laboratories received 78 female mice each comprising of 26 animals of each of 3 strains, delivered in 6 same-strain filter boxes. Strains were C57BL/6NCrl (hereafter: B6) and DBA/2NCrl (D2) inbred mice, and their B6 maternal cross B6D2F1/Crl (F1). Mice of about 8 weeks of age were purchased and shipped from Charles River Laboratories, Sulzfeld, Germany, and arrived within the first week of August 2007, when they were initially kept in three same-strain groups.

All animals were maintained under standard laboratory animal facility conditions with 12:12 h reversed light/dark cycle, at 45–65% relative humidity and at 20–24°C. Light phase is termed ‘day’, dark phase is termed ‘night’. Mice were divided into 18 groups of 4 same-strain females kept in type II polycarbonate cages provided with opaque nests boxes for cover. An additional three cages with two same-strain females each were kept in case an animal had to be replaced during an experiment. Animal bedding was changed on a weekly basis, and food and water were accessible ad lib. Weights were taken at arrival and at cleaning dates.

Before September 2007, a radio-frequencey identification (RFID) transponder was subcutaneously injected in the dorsocervical area of each animal, for individual recognition in the IntelliCages. The injection was performed under isoflurane or CO2/O2 anesthesia. At each start of a trial period, four mice from one cage of each strain were transferred into each IntelliCage, simultaneously. Within IntelliCages, animals had ad lib access to food and could access water according to experimental protocol (cf. Study design and Experimental setup and protocols).

All experiments have been carried out in accordance with the Guidelines laid down with the European Communities Council Directive of 24 November 1986 (86/609/EEC) and were approved of by the respective authorities: NKAR experiments were conducted in accordance with the policies on animal ethics and welfare of the Southern Stockholm Ethics Committee; ISS experimental procedures have been carried out in accordance with the Italian legislation (Decreto L.vo 116/92); UNIZH experiments were conducted under the license number 120/2005 issued by veterinary office of University of Zürich; EVOTEC experiments had the approval of the local institutional animal care committee ‘Amt für Gesundheit und Verbraucherschutz’, Hamburg, Germany.

Statistical model

The study design lends itself to a split-plot model, where the laboratories (Lab) represent the between-plot factor, mouse strain (Strain) the within-plot factor and each IntelliCage (IC) the plots (i.e. a total of six plots per lab). The fixed effects of foremost interest here are the effect of Strain as well as the interaction effect of Strain and Lab, i.e. the questions of how strain traits differ and whether strain effects are affected by laboratory identity.

The following general linear model (GLM) was therefore used: yijk = INT + Labi + Strainj + Lab × Strainij + IC(Lab)k(i) + Strain × IC (Lab)jk(i) + ejk(i)

Here, y represents the values of the dependent variable, INT the intercept of the model and e the residual error variance. Effects in parentheses are nesting the preceding effects.

Significance of the fixed effect ‘Strain’ and the interaction ‘Lab × Strain’ are determined by F-tests for which the mean sum of squared deviations of the Strain × IC(Lab) interaction represents the appropriate denominator variance (40 df). The gross Lab-effect is determined using the IC(Lab) interaction as denominator variance term (20 df). For learning variables (PlaceErrors, Reversal Errors, RelearnScore), one or both of the above denominator variances were estimated to be 0 in a mixed model. Hence the error variances with a deviating number of degrees of freedom estimated according to Satterthwaite's method are used (Table 6).

Table 6. anova table for spatial learning parameters during Place Learning and Reversal learning phases
N = 287LabStrainLab × StrainNotes
  1. Note that data could not be normalized and significance levels might therefore be conservatively underestimated.

VariableF(3,ddf)ddfP<F(2,ddf)ddfP<F(6,ddf)ddfP< 
PlaceErrors2.8620.10.0734.232530.00011.862520.09N = 285
ReversalErrors4.332720.0112.122720.00010.842720.55N = 284
RelearnScore4.0120.10.0335.132520.00011.202520.31N = 284

Note that the statistics on the basically random ‘Lab’ effect are reported here for completeness but further interpretation is rather limited by its nature. Furthermore, for comparison with former studies one may check effect size measures in Table S1 (supporting information). Statistical tests and parameter estimations were performed using sas software 9.13 (SAS Institute, Inc. 2006).

Data set and variables extracted

IntelliCage software provides for a raw data set that incorporates start and end datetimes of Visits (10 ms resolution) together with descriptive information of each Visit that allows for extracting the variables listed below.

Apart from the design and ID variables (Lab, Strain, Cage, Animal, Tag), the following dependent variables were extracted from the raw data for each animal during the different trial phases by means of R-scripts (Team 2007).

Initial phase of Free Adaptation

To evaluate potential differences between strain in reaction to a (socially) stressful situation, activity level and spatial behavior were compared using two variables:

  • 1InitialActivity: number of Visits within first half hour of experiment.
  • 2ExplorePhase: time from first Visit to visiting all corners.

Last 2 days of Free Adaptation

The last 2 days of Free Adaptation were used to extract measures of daily activity in a settled environment. To this end, the following four variables were extracted:

  • 1NocturnalActivity: Visits during nights,
  • 2DiurnalActivity: Visits during days,
  • 3VisitDuration: average duration of Visits,
  • 4DayPattern: (12)/(1 + 2).

The last variable spans from −1 (all activity during day phase) to 1 (all activity during night), with 0 indicating identical night and day activity.

Nosepoke Adaptation

During Nosepoke Adaptation, the animals had to learn to poke at doors in order to get access to water. As this might lead to strain-specific behavioral reactions, we analyzed these variables:

  • 1Activity: Visits per hour,
  • 2PokeProportion: Visits with Nosepokes/total number of Visits,
  • 3PokesPerVisit: number of Nosepokes/number of Visits with Nosepokes,
  • 4PokeDuration: average duration of a Nosepoke.

Last night of Drinking Session Adaptation

Activity was compared between drinking and non-drinking times for the last night (12 h) when mice were already adapted to the temporal water access limitation.

  • 1DrinkActivity: number of Visits during drinking sessions,
  • 2DryActivity: number of Visits outside drinking sessions,
  • 3DrinkPreference: (12/5)/(1 + 2/5).

DryActivity is divided by five for the DrinkPreference calculation, as non-drinking time is 10 h, whereas drinking sessions add up to 2 h.

Place Learning and Reversal Learning

Learning was assessed by the analysis of the last 42 Visits per animal that exhibited Nosepokes during the Place Learning trial (i.e. well after preference was fully established), and the first 42 Visits that exhibited Nosepokes during the Reversal Learning trial. This specific number of Visits was used in the analysis to ensure comparability of discrimination ability, as this was the lowest number of Visits with Nosepokes performed by an animal during the Reversal Learning trial period.

  • 1PlaceErrors: Visits of non-rewarded corners under Place Learning.
  • 2ReversalErrors: Visits of non-rewarded corners in Reversal trial.
  • 3RelearnScore: log[(PlaceErrors/42)/(ReversalErrors/42)], i.e. log(PlaceErrors) − log(ReversalErrors).

RelearnScore varies symmetrically around 0, when both error rates (errors per 42 trials) are equal. Negative values mean that relatively fewer errors were made at the end of the Place Learning trial than during initial Reversal phase, i.e. would be expected if spatial reversal would require some learning period that led to an intermittent decrease in correct corner discrimination. In consequence, the higher the RelearnScore values, the faster animals switched their preference to the spatially reversed rewarded corner.

Data processing

Data were taken on 288 mice (six cages in four labs with four females per three strains). In a few cases, transponders were lost so that some sample sizes are slightly reduced. Drinking Session Adaptation data for the first trial period as well as Free Adaptation data for the second trial period are missing because of technical problems in one research group each. Hence, sample size dropped to 264 and denominator degrees for freedom (ddf) to 36 and 18, for the respective F-tests (cf. Statistical model).

Several variables had to be transformed, and in a few cases extreme values had to be excluded, to induce residual variation to conform to the normality assumption. The latter were taken to be normally distributed if Shapiro–Wilk and Kruskal–Wallis tests both exhibited P > 0.05. Limits for excluding extreme data points, transformation method and final sample size after data exclusion, if any, are notified in the anova tables in the Results section. Untransformed data analyses did not give significant results for effects not significant with the transformed data. Place and Reversal Error rates could not be normalized as dispersion was too high. That means, significance levels of inferences on cognitive differences are most probably conservative (under)estimates.

Boxplots are used to allow full appreciation of the data dispersion. Interquartile and median are drawn as well as whiskers to a maximum of 1.5 times the interquartile distance from the box, and more extreme values are individually plotted. Boxes exhibit notches [1.58 × Interquartile range/√(N)] to characterize the error range of group locations (roughly equaling the standard error of the mean, SE, for normally distributed data). A symbol within boxes indicates the arithmetic mean. Even if variables have been transformed for statistical analyses, original data are plotted.

Results

Initial phase of Free Adaptation

Although there were significant overall laboratory effects, strains did not differ significantly in initial activity level and time to explore all corners, and there was no significant Strain × Lab interaction (Table 2).

Table 2. anova table for initial phase of Free Adaptation
N = 287LabStrainLab × StrainNotes
  1. Note exclusion of extreme values and log-transformation for analysis of ExplorePhase.

VariableF(3,20)P <F(2,40)P<F(6,40)P< 
InitialActivity8.390.0010.110.900.700.65 
ExplorePhase5.970.0050.370.702.040.09<9500, log, N = 283

Last 2 days of Free Adaptation

All measures of activity pattern differed significantly between strains (Table 3). Frequency of Visits was lower for D2 than for B6 during night and day, with F1 activity falling in between (Fig. 2). Average Visit duration was higher for D2 than B6, with F1 again falling in between (Fig. 2). DayPattern was significantly higher in D2, i.e. D2 mice reduced activity during the day relatively more strongly than the other strains (Fig. 2). Gross laboratory variation reached significance only for nocturnal activity level (Table 3).

Table 3. anova table for the last 2 days of Free Adaptation
N = 263LabStrainLab × StrainNotes
  1. Note exclusion of extreme values and square, square root, as well as log-transformations.

VariableF(3,18)P<F(2,36)P<F(6,36)P< 
NocturalActivity5.060.0241.710.00011.900.11Sqrt, <570, N = 260
DiurnalActivity0.410.76122.090.00011.330.27Sqrt
VisitDuration2.930.0766.250.00011.670.16Log, >5, N = 261
DayPattern1.58.23107.450.00010.530.79+1, squared
Figure 2.

Boxplot of nocturnal (upper left panel) and diurnal Visit frequency (upper right), average Visit duration (lower left) and relative nocturnal activity (DayPattern, lower right), for the three strains and four laboratories (decreasing gray shades for EVOTEC, ISS, NKAR, UNIZH). See Table 3 for statistical comparisons.

Hence, D2 mice exhibit lower rates of visiting corners but stay longer inside chambers when visiting, than B6 mice, and F1 activity falls in between. Furthermore, D2 reduced activity relatively stronger during the day than the other two strains.

Nosepoke Adaptation phase

During Nosepoke Adaptation, Activity significantly differed between strains and was higher in B6 than D2. During this phase, the activity of F1 did not fall in between that of the parental strains but was the lowest, on average, close to D2 level (Table 4; Fig. 3). D2 mice exhibited a higher proportion of Visits with Nosepokes than the other two strains while duration of Nosepokes did not differ significantly between strains (Table 4; Fig. 3). When nosepoking, D2 mice conducted a slightly higher number of Nosepokes per Visit which led to significant variation (Table 4; Fig. 3). All variables were significantly affected by laboratory identity (Table 4), however, no significant Lab × Strain interaction was seen.

Table 4. anova table for the Nosepoke Adaptation phase
N = 287LabStrainLab × StrainNotes
  1. Note arc sine as well as log-transformations for analyses of variables. Three extremely high Nosepoke values had to be excluded to achieve normality of residuals for PokeDuration.

VariableF(3,20)P<F(2,40)P<F(6,40)P< 
Activity3.530.0420.250.00011.530.20Log
PokeProportion9.810.00118.490.00011.410.24Arc sine
PokesPerVisit7.790.013.780.041.680.16Log
PokeDuration12.560.00010.300.742.280.06Log, <4.5, N = 284
Figure 3.

Boxplot of Visit frequency (upper left panel), proportion of Visits exhibiting Nosepokes (upper right), average number of Nosepokes per Visit (lower left) and mean Nosepoke duration (lower right) for the three strains and four laboratories (decreasing gray shades for EVOTEC, ISS, NKAR, UNIZH). See Table 4 for statistical comparisons.

Although B6 mice were most active, F1 activity did not seem to fall between the parental strains as under Free Adaptation, but activity was very similar to D2. D2, but not F1, mice countered lower visiting rate by exhibiting an increased incidence and frequency of Nosepokes per Visit. This implies that F1 mice fall with D2 regarding their activity level, but with B6 regarding nosepoking rate, when Nosepokes are required for access to water.

Last night of Drinking Session Adaptation

During drinking sessions, B6 visiting rate was significantly higher than in D2 and F1 (Table 5; Fig. 4), resembling the relationship of activity levels during Nosepoke Adaptation. During non-drinking times, activity pattern equaled that of Free Adaptation, i.e. B6 had higher rates than D2 and F1 activity fell in between (Table 5; Fig. 4). As for the Day Pattern difference under Free Adaptation, D2 reduced activity strongest during non-drinking sessions, though the effect was small (Fig. 4) and significance marginal (Table 5). The gross laboratory effect turned up significant only during the high activity phase during drinking sessions, resembling Free Adaptation relations (Table 5).

Table 5. anova table for Drinking Session Adaptation
N = 262LabStrainLab × StrainNotes
  1. Note exclusion of three extreme outliers for DrinkActivity and DrinkPreference where two mice did not visit at all and one mouse did so only three times during drinking sessions.

VariableF(3,18)P<F(2,36)P<F(6,36)P< 
DrinkActivity4.240.0218.980.00010.850.55>3, N = 259
DryActivity0.360.7918.580.00012.200.07 
DrinkPreference2.070.153.870.040.710.65>−0.8, N = 259
Figure 4.

Boxplot of Visit frequency during drinking sessions (top panel) and outside drinking session (middle), as well as relative activity during drinking sessions (DrinkPreference, bottom), for the three strains and four laboratories (decreasing gray shades for EVOTEC, ISS, NKAR, UNIZH). See Table 5 for statistical comparisons.

In consequence, the data are consistent with the prior finding of B6 exhibiting higher activity levels than D2 and F1 falling in between when unconstrained, and with the finding that F1 levels fall to D2 levels when drinking becomes conditional on nosepoking. Also, relative activity reduction during ‘resting’ times appears to be consistent with the findings under Free Adaptation.

Place Learning and Reversal Learning

Learning parameters differed significantly between strains (Table 6). B6 mice exhibited a lower rate of errors at the end of the Place Learning phase than D2, but had a higher rate of errors after reversal of the rewarded corner than D2 (Fig. 5). F1 discriminated the least at the end of Place Learning, and performed intermediately during the initial trials after reversal, in both cases keeping a slightly higher error rate than D2.

Figure 5.

Boxplot of place errors before (top panel) and after place reversal (middle), as well as reversal performance (Relearn Score, bottom), for the three strains and four laboratories (decreasing gray shades for EVOTEC, ISS, NKAR, UNIZH). See Table 6 for statistical comparisons.

In consequence, B6 took longer to reach prior discrimination levels, whereas D2 and F1 had similar RelearnScores (Fig. 5). Laboratory effects tended to be significant for those three measures (Table 6).

Discussion

Differences between laboratories were apparent for the majority of behavioral and learning variables, in spite of our efforts to standardize and synchronize animal treatment and experimental conduct. As outlined in the introductory part, this finding supports the notion that subtle and, in practice, uncontrollable differences in handling and environmental circumstances exert measurable effects on animal activity patterns, which in turn can influence behavioral measures, including those concerning learning ability. However, the more pertinent question is whether such random differences could interfere with quantitative differences between strains so as to lead to superficial results regarding the nature of strain differences in behavioral and/or cognitive traits (Schielzeth & Forstmeier 2009). In this respect, our results show that while the behavioral and learning performances significantly differed between strains, these differences were consistent between laboratories, i.e. no significant laboratory by strain interactions showed up in the data. In other words, inferences taken from the experimental results would not differ between laboratories, in contrast to findings in other studies using different methodologies (cf. introductory paragraph). In particular, in the context of another multilab study where the same three strains had been tested in three different labs in a battery of conventional tests involving open field, elevated O-maze, object exploration and water-maze place navigation, highly significant strain × laboratory interactions were found in all tests (Wolfer et al. 2004). Given that the human experimenter is a major source of variation in conventional behavioral experiments (Chesler et al. 2002; Crabbe et al. 1999), there is good reason to suspect that the near-complete elimination of human interference with animal behavior by use of fully automated testing within the home cage has strongly contributed to the enhanced consistency of strain differences across laboratories in the present study.

The second major difference between the present and other multilab studies is the fact that in the IntelliCage animals are tested in a socially and environmentally enriched setting rather than individually in ‘traditional’ laboratory behavioral apparatus. As shown previously (Wolfer et al. 2004), enriched housing before and between ‘traditional’ behavioral tests does not increase between-laboratory variation of strain differences. The absence of significant laboratory by strain interactions in the present study shows that social and environmental enrichment does not even interfere with the consistency of strain effects if present throughout the study, including behavioral testing itself. However, social interactions during behavioral testing add a new dimension to the procedures and most likely affect the magnitude and direction of strain effects on many behavioral variables independent of laboratory. The determination of the extent of this contribution awaits further research, as several platforms exist that were designed for fully automated testing of individually housed mice in their home cage (Chen et al. 2005; de Visser et al. 2005; Goulding et al. 2008; Van de Weerd et al. 2001), but none of them has to date been evaluated in a multilab study.

The most prominent feature of the IntelliCage apparatus is the use of nosepoking as endpoint behavior in conditioning paradigms. This creates tasks that fit more appropriately into the behavioral repertoire of mice than behavioral tokens taken in most conventional test paradigms, thereby potentially leading to more robust behavioral expression that is less apt to react to subtle maintenance differences. Hence, although IntelliCage tasks cannot directly be compared with conventional procedures at the level of behavioral measures they may turn out to be much more appropriate and replicable as means to study underlying processes in the brain. Although further research is needed, our current study indicates that IntelliCage studies could potentially provide more reliable information about motivational, emotional and cognitive control mechanisms than conventional tests.

Activity measures

Our results indicate that there were no strain differences during the initial 30 min of Free Adaptation which may appear surprising, given that these three strains show clear differences in conventional tests of exploratory behavior (Wolfer et al. 2004). Several factors may have contributed to the failure to find strain differences during this initial phase of the experiment. Much of the reason for the lack of a difference may lie in the extreme variability underlying the novelty situation caused by introducing a number of unfamiliar animals into the same cage, which results, eventually, in the formation of a relatively stable ‘social structure’. When mice are put into an IntelliCage with social partners and unknown individuals at the same time, diverse coping behaviors imply that for some animals the actual exploration of the cage might start with a considerable delay, and motivation to do so will also shift and differ over time. The effect of social interactions during the early adaptation phase could be tested by placing mice individually or in same-strain groups into IntellCages for the first time. From a practical point of view, the socialization phase could also be avoided by forming cohorts sufficiently long before placing the animals into IntelliCages.

In line with higher activity of B6 that is found in most (Kafkafi et al. 2005; Logue et al. 1997a; Wolfer et al. 2004), although not all (Podhorna & Brown 2002), conventional open field tests, B6 mice were the most active in terms of Visit frequency and D2 the least active. The activity levels of F1 fell in between those of the parent strains when unconstrained and to D2 levels when drinking became conditioned by nosepoking. D2 showed longer Visits with more Nosepokes and stronger reduction of Visit frequency during light phase and non-drinking time during the drinking sessions protocol. During experiments requiring nosepoking for access to water, F1 mice followed D2 mice in showing similar activity levels that were lower than B6 mice, but equaled B6 mice in nosepoking rates. These findings indicate that at least two independent determinants are inherited that shape activity patterns in IntelliCages, as otherwise the differential response of activity level and nosepoking rate dependent on circumstances of F1 mice could not be resolved.

Learning measures

All strains mastered the place learning tasks well, including place reversal, but there were differences in performance. B6 discriminated stronger than D2 at the end of preference acquisition, whereas D2 appeared to be faster at re-learning after spatial reversal, indicating that the replacement of a former spatial preference by a new one is more efficient in our D2 mice. F1 showed the poorest performance in these tasks as they discriminated less accurately than D2 before and after place reversal.

A comparison with our findings on activity levels show that if the more accurate spatial discrimination of B6 mice stemmed from their lower nosepoking rate, F1 should have approached B6 discrimination levels. Likewise, if lower overall activity would be related to less spatial accuracy, F1 values would be expected to equal D2 accuracy. The fact that F1 hybrid mice discriminate even less accurately than D2 prereversal and about equally as B6 postreversal may indicate that such differences were based on different problem solving strategies involved in spatial discrimination expression. One possibility would be that spatial learning speed as well as the propensity to commit unrewarded Nosepokes are higher in D2 than in B6 mice, and that F1 mice fall with B6 in learning speed but with D2 in preference accuracy levels.

Whether and how place learning tasks in the IntelliCage environment can be compared with spatial tasks in conventional water-, radial- or T-mazes remains to be established further by using other test schedules in the IntelliCage. In any case, the superiority of B6 mice over D2 in acquiring the IntelliCage place learning task is in line with the results of several water-maze (Logue et al. 1997b; Schopke et al. 1991; Wolfer & Lipp 2000; Wolfer et al. 2004) as well as radial maze studies (Crusio et al. 1987; Rossi-Arnaud et al. 1991). In the water-maze, however, B6 also proved to be more efficient reversal learners than D2 (Schopke et al. 1991), which is in contrast to the reversal experiment in the present study. Although the relative ranking of B6 and D2 strains agrees at least in part with the existing literature, the overall poor performance of F1 in the IntellCage place learning task is surprising, given the good performance of these hybrid mice (both males and females) in water-maze tasks (Logue et al. 1997b; Wolfer & Lipp 2000).

Conclusion

Under a thoroughly standardized protocol, we have run a synchronized multilaboratory behavioral study in order to assess the validity of a recently developed behavioral phenotyping equipment: the IntelliCage. Here, we provide evidence that under our experimental settings, behavioral and cognitive traits of inbred strains of mice can readily be detected and are robust to the identity of the lab, i.e. the differences between strains remained equivalent, regardless of laboratory identity. We suspect that the consistency of the IntelliCage in respect to behavioral phenotype measures stems from the exclusion of human interaction with animals during test trials and, possibly, the more heterogeneous status of animals because of environmental enrichment and social variability, as well as a more appropriate type of behavioral endpoint, i.e. the nosepoke. However, we would like to caution against premature conclusions, as other studies have used larger numbers of strains and/or sample sizes, which might have led to higher power to detect lab × strain interactions. Ultimately, direct experimental comparisons of methods conducted in identical laboratories are needed to test for differences in equivalence produced by divergent behavioral measures and methods.

Acknowledgments

This study was conducted in partial fulfillment of European Sixth Framework Programme project no. 037965 ‘INTELLIMAZE’. S.K. kindly acknowledges partial support by the NCCR ‘Neural Plasticity and Repair’. A.C. and A.H.M. gratefully acknowledge the support of The Wallenberg Foundation. We are highly appreciative of the insightful comments by two anonymous referees.

Ancillary