Reduced sociability (reduced tendency to seek social interaction) is a core feature of autism spectrum disorders (ASD) that is highly disabling and for which effective treatments are sorely lacking (Hill and Frith,2003). The biological basis of social impairments in ASD remains largely unexplained. A vital research priority is to develop mouse models relevant to ASD, because the experimental control that models systems afford will be indispensable for unraveling the complex biological basis of sociability deficits. An important part of these efforts to develop model systems will be refining methods for measuring sociability in mice.
Social affiliative behaviors have been studied in mice using several experimental paradigms. Mice may be observed freely interacting in a novel environment (Social Interaction Test; de Angelis and File,1979; File and Seth,2003) or in their home cages (Lijam et al.,1997; Mondragon et al.,1987; Terranova et al.,1994) to allow quantification of naturalistic behaviors, including passive social behaviors, such as huddling together, as well as more active social behaviors, such as sniffing and allogrooming. An advantage of these assays is their ecological validity: the behaviors of these freely interacting mice resemble those of feral mice in their natural environments. A disadvantage is the complexity of these social interactions: because either or both mice can easily initiate, maintain, or modulate an interaction, disentangling the contributions that each mouse makes to the interaction can be difficult.
By contrast, more controlled social affiliation assays appear to be less ecologically valid but simplify the social interaction, making it more feasible to measure the tendency of a specific mouse to approach or avoid another mouse. These controlled assays include the Partition Test (Kudryavtseva,2003; Moretti et al.,2005; Spencer et al.,2005) and the Social Choice (or Social Approach) Test (Brodkin et al.,2004; Moy et al.,2004; Nadler et al.,2004; Sankoorikal et al.,2006). The Social Choice Test is conducted in a three-chambered apparatus, or box, with a transparent, air-permeable cylinder in each of the two end chambers. After a period in which the “test” mouse is habituated to the apparatus, a “stimulus” or “target” mouse is then confined inside one cylinder, so that the stimulus mouse cannot approach the test mouse to initiate or maintain a social interaction. With only enough space to turn around, the stimulus mouse is always close to or against the cylinder wall, so that it can be easily sniffed through the holes in the cylinder wall and otherwise investigated by the test mouse, which is free to move throughout the box. This high degree of control limits affiliative behaviors, especially by the stimulus mouse, but ensures that any active social interaction can occur only if the test mouse initiates and maintains that interaction. Thus, the social choice assay is well suited for isolating and quantitatively measuring the sociability of the test mouse.
One may analyze sociability in the Social Choice Test using any of several measures. “Social chamber time” can be defined as the amount of time that the test mouse spends in the end chamber that contains the stimulus mouse, and “social cylinder time” can be defined as the amount of time that the test mouse sniffs and otherwise investigates the cylinder that contains the stimulus mouse. One may calculate “chamber/cylinder preference” scores and “chamber/cylinder preference change” scores (see Methods and Materials), which theoretically may improve control of the analysis (Sankoorikal et al.,2006). Thus, one may potentially assess sociability by six different but related scores: social chamber time, social cylinder time, chamber preference, cylinder preference, chamber preference change, and cylinder preference change.
This multiplicity of scores raises the question of whether any of the scores are more valid than the others. If any score always yields the same conclusion as the other scores, then they are redundant. However, if the scores sometimes disagree, then on which resulting conclusion should one rely for further research?
The scores usually support the same conclusions (Nadler et al.,2004; Sankoorikal et al.,2006; Crawley et al.,2007; Yang et al.,2007b; Ryan et al.,2008), but they sometimes disagree with each other, and this has created some ambiguity in published studies. Moy et al. (2007) tested 10 inbred strains of mice for sociability and, in an analysis akin to a preference score, identified three strains for which chamber and cylinder scores disagreed on whether the strains should be considered sociable. Similar disagreements occurred in one cohort of vasopressin receptor 1B (Avpr1b) null mutants and heterozygotes tested during the circadian light phase (Yang et al.,2007a; Fig. 3I,J) and in one cohort of Fragile X mental retardation 1 mutants () on a FVB/129 genetic background (Moy et al.,2009; Figs. 2B, 3A). Fairless et al. (2008) hypothesized a positive correlation within the BALB/cJ inbred mouse strain between sociability and the size of the corpus callosum. The chamber preference change score showed no such correlation, yet the cylinder preference change score did. Such discrepancies may be resolved by determining which, if any, of the sociability scores are more valid than the others.
There are many criteria by which to assess the validity (henceforth called “general validity”) of a measurement. One criterion is test-retest reliability. Assuming that a behavior is temporally stable and is not substantially changed by the testing procedure, measurements of that behavior in the same mice at different times should yield a positive correlation. This approach has been used to study rodent behaviors in the elevated plus maze (Lister,1987; Rodgers et al.,1997; Andreatini and Bacellar,2000), forced swim test (Drugan et al.,1989; Hilakivi and Lister,1990), free-exploratory paradigm (Teixeira-Silva et al.,2009), open field (Henderson,2005), and other experimental paradigms. A second criterion by which to generally validate an experimental measurement is its ecological validity, or how closely the measurement relates to behaviors in naturalistic situations. One way to assess ecological validity is to measure the correlation between behaviors in one test with behaviors in a more naturalistic or ecologically relevant test. Other criteria of general validity of animal models of human disease, such as etiological/construct validity and predictive validity (Crawley,2004) are beyond the scope of this study.
To assess the test-retest reliability of the six sociability scores, we reanalyzed data from a previously published experiment (Sankoorikal et al.,2006) in which the same mice underwent the Social Choice Test once per day on two consecutive days. At the conclusion of the Social Choice Test on the second day, the cylinders were removed so that the test and stimulus mice could both move about and interact freely. This phase of the test closely resembled the Social Interaction Test and was more naturalistic than the Social Choice Test. To assess the ecological validity of the six sociability scores, we correlated those scores with the amount of time that the test mouse investigated the stimulus mouse during this “Free Social Interaction” period. We hypothesized that the chamber preference change score would show the highest test-retest reliability and that the cylinder preference change score would show the highest ecological validity.
Our data analysis in this study indicates that cylinder scores are more reliable and ecologically valid measures than chamber scores for measuring sociability in the Social Choice Test. Automated tools that locate a mouse are well-established and allow chamber scores in the Social Choice Test to be easily obtained (Nadler et al.,2004; Page et al.,2009). However, fewer automated methods exist for recording sniffing and other active behaviors on which the cylinder scores are based. Here we evaluate software that can accurately measure cylinder scores. We hypothesized that an automated video analysis system could measure cylinder scores with accuracy comparable to that of a human rater.
Finally, some have suggested that a measurement of the mouse's proximity to the social cylinder, that is, a measurement of the amount of time that the test mouse spends in an area that is near the social cylinder but smaller than the entire social chamber, would be a valid alternative measurement of sociability (alternative to chamber and cylinder scores) in the Social Choice Test (Page et al.,2009). We hypothesized that this alternative measurement would correlate highly with social cylinder time and thus provide an adequate substitute for directly measuring cylinder scores.
Contrary to our hypothesis, cylinder scores, not the chamber preference change score, showed the highest test-retest reliability. Additionally, cylinder scores obtained on Day 2 were more ecologically valid than Day 2 chamber scores. Although the ecological validity of Day 1 cylinder scores appeared to be higher than the ecological validity for Day 1 chamber scores, this difference did not reach statistical significance. Yet even in this case, all the cylinder scores attained higher correlations than did any of the chamber scores. Thus overall, cylinder scores achieved higher test-retest reliability and ecological validity than did chamber scores.
Our sample size did not provide sufficient statistical power to detect the relatively small differences among social chamber/cylinder times, preference scores, and preference change scores. Yet notably, the social cylinder time showed the highest test-retest reliability of all the scores. Furthermore, the social cylinder time, and not the hypothesized cylinder preference change score, showed the highest ecological validity, though only slightly higher than the other cylinder scores. Thus, the three cylinder scores were more generally valid than the three chamber scores, and the social cylinder time may be the most generally valid of the three cylinder scores; however additional study with larger sample sizes would be required to determine whether the latter point is true.
We had hypothesized that preference change scores would be the most generally valid scores because they theoretically control for a mouse's tendency to explore a nonsocial stimulus (as do preference scores) and for its individual preference for a chamber or cylinder. It was therefore surprising that the cylinder preference and cylinder preference change scores were not more generally valid than social cylinder time in any case. The information about nonsocial stimulus investigation and prior chamber/cylinder preference may be weakly or not related to sociability, so by including it, these scores may introduce nearly random “noise” to the “signal” of social cylinder time. The lower test-retest reliability of these complex scores, compared to social cylinder time, may support this notion, but a similar pattern does not appear for ecological validity. Regardless, the experimenter should be cautioned against assuming that a more complex score necessarily improves the general validity of a behavioral analysis, and it may sometimes decrease general validity.
The superior general validity of cylinder scores over chamber scores suggests that sociability should be measured primarily by including only the active behaviors that are most directly related to social investigation. The predominant active social behavior is sniffing the cylinder. When its nose is in contact with the cylinder, the test mouse can likely perceive both volatile and nonvolatile odorants from the stimulus mouse (Luo et al.,2003; Brennan and Kendrick,2006; Sanchez-Andrade and Kendrick,2009). Other active social behaviors include scratching, gnawing, climbing on, and rearing against the cylinder. Chamber scores may include other, passive social behaviors that can occur with some distance between the mice, such as when the test mouse chooses to be near another mouse, watches that mouse, or smells volatile odorants that have diffused some distance from that mouse. But these scores also include behaviors that are not clearly social, such as sniffing the chamber walls, walking through the chamber but not towards the social cylinder, and remaining still next to the chamber wall. Likewise, low locomotor activity may substantially affect a chamber score. Excluding these behaviors by accounting for only active behaviors directed toward the social cylinder yields a more generally valid measurement of sociability.
Because this study analyzed a heterogeneous group of mice, some conclusions may not apply evenly across all subgroups. With no more than 20 mice in each subgroup, statistical power was not sufficient to test robustly for correlational differences among the subgroups, and the estimates of the magnitude of the correlations for each subgroup are imprecise. However, some general patterns are noteworthy.
Social cylinder time showed higher test-retest reliability and ecological validity than social chamber time for nearly all subgroups. This was not true for the test-retest reliability of the 4-week-old BALB/cJ females, which behaved inconsistently across test sessions. Thus, for no experimental group did social chamber time indicate greater reliability than social cylinder time. For ecological validity, the 9-week-old C57BL/6J females were an exception, where the social chamber time correlation exceeded that of social cylinder time. However, the chamber time correlation was near zero, while the cylinder time correlation was surprisingly negative, though not of high magnitude. Thus even in this case, the social cylinder time may show a relationship that the chamber time does not show. In sum, there is no evidence that the chamber scores are more generally valid than the cylinder scores for any subgroup.
The correlations presented here are based on the behaviors of individual mice. While assessing reliability on an individual level is a common approach (Lister,1987; Drugan et al.,1989; Hilakivi and Lister,1990; Andreatini and Bacellar,2000; Henderson,2005; Teixeira-Silva et al.,2009), studies of anxiety-related behaviors suggest that examining behaviors on a group level can yield different results (Ramos,2008). In some cases, group-level analyses were able to detect behavioral correlations that were not present at an individual level. Thus, the possibility remains that a group-level analysis could detect higher general validity of chamber scores than has been found here. In developing the Social Choice Test, Moy et al. (2004) presented evidence that chamber scores are generally reliable at a group level: adult C57BL/6J and DBA/2 mice showed largely similar chamber scores between a test and re-test 11–12 days later. However, cylinder scores were not reported in this experiment, so it is unclear whether the chamber scores' reliability equals that of the cylinder scores at a group level. Given the large difference in general validity between chamber and cylinder scores found here, it is unlikely that a group-level comparison of chamber and cylinder scores would undermine our recommendation to primarily use cylinder scores to evaluate sociability.
This study was limited by the use of archival data (Sankoorikal et al.,2006) that were not originally designed to answer questions on the general validity of the sociability scores. One limitation was potential test order effects: the interactions of the mice during Phase 2 (Social Choice) might have affected their subsequent interactions in Phase 3 (Free Social Interaction). No mice were tested in Phase 3 before Phase 2 to identify any test order effects. Furthermore, a test mouse was exposed to the same stimulus mouse for Phase 2 on Day 2 and for Phase 3 (also on Day 2), which may have attenuated their interaction during Phase 3 due to a habituation effect. However, any attenuation of social interaction that affected the mice fairly uniformly would not have greatly affected the correlations between Phase 2 and Phase 3, which were based on Pearson's r. Notably, attenuation of social interaction (habituation) seems even less likely between Phase 2 on Day 1 and Phase 2 on Day 2, because each test mouse was tested with different stimulus mice on Day 1 and Day 2 and because of the day-long interval between tests. Additionally, any test order effects might have been minimal: the effects of prior testing experience depend on the specific paradigms used and do not necessarily affect results substantially (McIlwain et al.,2001; Henderson,2005).
The Social Choice Test is a highly controlled assay for social affiliation, and this high level of control entails curtailing some naturalistic aspects of social interactions between mice. Confining the stimulus mouse to a cylinder in Phase 2 allows one to isolate, to some degree, the sociability of the test mouse. But it also alters the quality or nature of the social interaction, because the confinement of the stimulus mouse limits its ability to initiate, maintain, and terminate a social interaction and to respond to social cues from the test mouse. Social behaviors of the test mouse may also be affected by being in a novel environment, which can induce exploratory and anxiety-related behaviors, and by the inability to fully contact the stimulus mouse due to the presence of a partial barrier between them (cylinder wall with holes in it). However, it is worth noting that this controlled social interaction shows some similarity to a more naturalistic interaction, as shown by the positive correlations between the social measures of Phase 2 (Social Choice) and Phase 3 (Free Social Interaction; Fig. 3). Moreover, we have chosen to regularly include a Free Social Interaction phase in the Social Choice Test in all of our studies (Brodkin et al.,2004; Sankoorikal et al.,2006; Fairless et al.,2008), to include both a more controlled and a more naturalistic way of observing social interactions in the context of the Social Choice Test.
Phase 3 (Free Social Interaction) is more naturalistic than the Phase 2 (Social Choice), during which the stimulus mouse is confined to a cylinder, because both mice can move freely in Phase 3. Nevertheless, Phase 3 still differs substantially from a social situation between feral mice in their natural environment. Among many other artificial factors, the mice in the Free Social Interaction (Phase 3) are laboratory-bred; are restricted to a novel, artificial environment; and interact in the presence of a human. Strategies that reduce or eliminate such factors to attain more naturalism, such as observing mice in home cage environments or semi-natural burrow habitats, can be related to the Social Choice Test to further investigate its ecological validity. Importantly, mice of the inbred strain BTBR T+ tf/J show lower social behaviors than C57BL/6J mice in both the Social Choice Test and semi-natural burrow habitats (McFarlane et al.,2008; Pobbe et al.,2010). Unlike this study, these results are based on group-level analyses, but they do support the notion that results from the Social Choice Test can be relevant to more naturalistic social situations.
Since the archival data were collected, the procedure for the Social Choice Test has been altered. In the earlier experiment (Sankoorikal et al.,2006, this study, Experiment 1) the stimulus mouse was placed into one cylinder while the other cylinder remained empty at the start of Phase 2. When the stimulus mouse was introduced, it was both a social stimulus and a novel stimulus. To control for novelty as a possible confound, subsequent experiments have included a novel object that is introduced into the other (nonsocial) cylinder at the same time that the stimulus mouse is introduced into the social cylinder at the start of Phase 2 (Fairless et al.,2008; this study, Experiment 2). Given this change, it is possible that the results concerning test-retest reliability and ecological validity from Experiment 1 would not apply well to subsequent experiments. We consider this unlikely because the procedure change (presence of the novel object in the nonsocial cylinder) has not substantially changed behaviors of test mice: the test mice generally sniff the nonsocial cylinder little compared with the social cylinder using either procedure, and experimental results in C57BL/6J and BALB/cJ mice have been very similar before and after the procedural change (e.g., juvenile BALB/cJ mice consistently have shown lower sociability than juvenile C57BL/6J mice, both before and after the procedure change; Sankoorikal et al.,2006, Fairless et al.,2008).
Tools that can automate the measurement of chamber scores are well established (Nadler et al.,2004; Page et al.,2009) and widespread, and this may account for the prevalence of using only chamber scores to assess sociability in the Social Choice Test. Given the cylinder scores' superior general validity indicated in our study, exclusive use of chamber scores may produce a higher rate of undetected false positives and false negatives in the Social Choice Test. To facilitate the use of cylinder scores, we have validated the software TopScan for automated measurement of cylinder sniffing in the Social Choice Test. At the settings that we specified, TopScan performs as well as or better than human raters at this task, as we had hypothesized.
Some have suggested that a mouse's proximity to a cylinder provides an adequate measure of sociability (Page et al.,2009). Contrary to this hypothesis, our data show that this approach provides a measurement of sniffing less accurate than that of directly measuring sniffing of the cylinder, either by manual or automated methods. We have observed that test mice often walk beside or along the cylinder wall, but orient their heads towards the cylinder for only brief, intermittent periods to sniff. This behavior may account for much of the discrepancy between the “cylinder proximity” measurements and our recommended “cylinder sniffing” approach. In summary, use of the cylinder proximity approach may risk a higher rate of false positives and false negatives in assessing sociability in the Social Choice Test; our results support the use of direct measurements of cylinder sniffing.
The higher general validity of cylinder scores compared with chamber scores suggests that active investigation of a conspecific is the predominant component of sociability in the Social Choice Test. Sociability, the tendency to approach and affiliate with an unfamiliar conspecific, is a relatively simple social behavior, but it is important in many species as a prelude to more complex behaviors, such as the formation of social bonds. Research into the biological factors that influence sociability in mouse models of ASD may eventually yield insight into the social impairments of ASD, and optimal measurement of sociability is essential to obtaining clear results in this endeavor.