Relating ultrasonic vocalizations from a pair of rats to individual behavior: A composite link model approach

Ultrasonic vocalizations (USVs) are crucial in the social behavior of rats. We aim to relate USV rates of pairs of rats to individual activity in an automated home cage (PhenoTyper®) where USVs are recorded per pair and not per individual. We propose a composite link model approach to parametrize a mechanistic “sum‐of‐rates” model in which the pair's USV rate is the sum of the USV rates of individuals depending on their own behavior. In generalized linear models (GLMs), the individual's USV rates are multiplied. We verified through simulation that composite link model gave lower Poisson deviance than GLM. We analyzed the data from an experiment in which half of the cages did allow the pairs to interact (Pair Housing) and the other half did not (Individual Housing). The “sum‐of‐rates” model fits best for Individual Housing and GLM for Pair Housing. An additional simulation study strongly suggests that interaction between rats changes the underlying mechanism for vocalization behavior.


INTRODUCTION
Automated home-cage systems have made it possible to study spontaneous behavior of laboratory rodents. Studying laboratory rodents (here, rats) in pairs is an essential step toward understanding their social behavior and toward using rats as a model species for studying psychiatric disorders with social impairment such as autism and depression. Recently, important steps have been made to improve the individual tracking of multiple rats in a home cage. These advances allow researchers to study vital social behavior such as playing and fighting. The PhenoTyper ® system (Noldus, Wageningen, The Netherlands) tracks the whereabouts of both rats in the cage using a top-view camera and obtains their exact location in each video frame. The distance that the rat covers in the interval between two frames (further: time interval, typically 1/25 s) is used to calculate several parameters such as its velocity. With such a system, we can study multidimensional aspects of behavior (e.g., repetitive and stereotypic behaviors and social interaction, which are relevant to autism) for longer periods of time without human interference (Kas et al., 2014).
In addition to the movement of rats, ultrasonic vocalizations (USVs) are assumed to be an important part of rat behavior. USVs are indicators of the emotional state of the vocalizing rat (Brudzynski, 2009;Burgdorf et al., 2008), important for establishing and maintaining social contact such as in sexual and playing behavior (Himmler, Kisko, Euston, Kolb, & Pellis, 2014;Wöhr & Schwarting, 2013), and USVs can invoke response of other rats, for example, transmission of fear (Kim, Kim, Covey, & Kim, 2010). Emission of USVs in social settings is one of the most widely used means to study social communication in rodents (Servadio, Vanderschuren, & Trezza, 2015) and can provide more insight into the functional meaning of social behavior (Peters, Pothuizen, & Spruijt, 2015).
Using the Sonotrack ® recording system (Metris, Hoofddorp, The Netherlands), we can continuously record vocalizations of rats in the PhenoTyper ® . Interpretation of these data streams for studying autism, for example, requires development of analytical methods to establish the relation between USVs and behavior (Kas et al., 2014). As of yet, it has not been possible to assign recorded calls to individual rats when multiple rats are housed in the PhenoTyper ® .
Allocating USVs to individual rats is difficult because the "voices" of rats do not noticeably differ between individuals and because USVs recorded in (automated home) cages cannot be traced back to their location of origin because echoes of USVs can be as loud as or louder than the original USVs (R. Bulthuis, Metris, personal communication 11th of October, 2016).
Being able to record activity per individual rat and not being able to record USVs per individual rat poses a challenge when integrating both data streams. Because of this challenge, Ågmo and Snoeren (2015), for example, could not analyze the full data of their experiment on the effect of USVs on mating behavior. We address the challenge by proposing a simple mechanistic model: Sound is recorded when either one of the rats produces sound, and the recorded vocalization rate is thus the sum of the two individual rates ("sum-of-rates" model). For the individual rates in relation to the individual activity, we then assume a generalized linear model (GLM). In this way, the model becomes a composite link model (CLM) (Thompson & Baker, 1981), which we extend to be applicable to fine-scaled video frame data in which the USVs are binary.
In the first part of the paper, we show that the "sum-of-rates" model performs better in simulations than a traditional Poisson GLM if the mechanistic model is true. In the second part of the paper, the "sum-of-rates" model is applied to a case study. In the case study, data from pairs of rats that could or could not interact are analyzed. In the mechanistic model, rats in a pair are considered to be two separate entities without interaction. We show that the potential for interaction between rats fundamentally changes the relation between activity and vocalization behavior, and thus, the mechanistic model proposed in the first part of the paper no longer holds when rats have been given the potential to interact, which strengthens the claim that vocalization behavior plays a role in social interaction.

Problem definition
We wish to integrate USVs and the activity of two rats. For simplicity, let us suppose that the activity per rat (A 1 and A 2 for rat 1 and 2, respectively) is recorded for every time interval in three categories: S, L, and P, denoting Stopping (sitting still), Lingering (moving slowly), and Progressing (moving quickly), respectively. For every time interval, we evaluate whether or not a USV was detected. There exist nine combinations of activity states (A 1 A 2 ∈ States, with States = {SS, SL, SP, LS, LL, LP, PS, PL, PP}. For all these combinations, we can sum the time intervals with observed USVs ( A 1 A 2 with A 1 A 2 ∈ States) and the total number of time intervals (F A 1 A 2 with A 1 A 2 ∈ States). We are interested in estimating the cage-USV rate ( A 1 A 2 with A 1 A 2 ∈ States) (i.e., the expected number of USVs per time interval). The cage-USV rate can easily be estimated using a Poisson GLM with responses A 1 A 2 , a log-link function, nine parameters, and an offset equal to the logarithm of F A 1 A 2 . If the vocalization rates of rats are the same, then SL = LS , SP = PS , and LP = PL , so that the number of parameters reduces from nine to six. In that case, the data can be represented as in Table 1 and the corresponding model is as follows: Combined Activity: where h is the inverse of the log-link function (h(x) = exp(x)), and A 1 A 2 represents the parameter to be estimated. This Poisson GLM approach (Combined Activity GLM) is easy to implement and flexible, but we have to estimate a separate parameter for every combination of activity states and we completely disregard the fact that there exist two individual rats that both emit USVs rather than a single source. An alternative Poisson GLM model (Count GLM) is to use the activity states per rat as explanatory variables, rather than activity combinations. If the vocalization rates of both rats are identical, we require only three parameters ( S , L , and P ) and can represent the data as in Table 2.
The Count GLM model reduces the number of parameters from six to three. It still disregards the fact that there are two individual rats that both vocalize. Note that, due to the log-link function, the Count GLM model estimates A 1 A 2 as a product of (what we would like to interpret as) rates, where the rates are the exponentiated parameters. However, neither parameter has an interpretation as a rate, because rates should be summed rather than multiplied. For example, this model gives no estimate for the USV of a lingering rat.

CLM approach
Mechanistically, the USV rate of a pair of rats is the sum of the USV rates of both rats. Therefore, a more mechanistic model would estimate the USV rates per rat based on their activity and would sum the following estimates: Sum-of-rates CLM: This "sum-of-rates" model can be recognized as a CLM (Thompson & Baker, 1981). In CLM, one observation can be linked to multiple linear predictors, which allows us to link the number of time intervals with observed USVs of a cage to a separate linear predictor per rat. The exponents of the three estimated parameters, exp( S ), exp( L ), and exp( P ), can be interpreted directly as the USV rate of a rat in the Stopping, Lingering, or Progressing activity state, respectively. An alternative mechanistic interpretation for the parameters of the same statistical model is described in Appendix A. The concept of the CLM is easiest to grasp in matrix notation. The CLM model is written as follows: = Ch(X ), where C is a so-called link matrix, h is the inverse of a link function (log-link in our situation), matrix X contains the observations, and vector contains the parameters. If the link matrix is the identity matrix, the model is equivalent to a GLM. The "sum-of-rates" model is a CLM where C consists of two adjacent identity matrices and X has double the number of observations, one for every rat rather than one for every cage (see Appendix B for details).
In the special case of a model where the linear predictors are the same for both rats, the "sum-of-rates" model is equivalent to a Poisson GLM with a constant difference of log(2). For example, a CLM model for predicting USV rate as a function of cage temperature T is as follows: where i is the intercept in the "sum-of-rates" model, and * i is the updated intercept in the Poisson GLM approach.

Extension to multiple cages
Pairs of rats may vocalize with different overall frequencies. When extending our approach for multiple cages, we would therefore like to allow for a cage-specific intercept for each of the K cages (k = 1, … ,K). As a consequence, one of the activity states (Stopping) is chosen as the reference category, and S is set to zero. In the GLM approach, the exponent of the cage-specific intercept k can be interpreted as the USV rate of the cage when both rats are in the activity state Stopping. For the CLM approach (further: Count CLM), k is included in the linear predictor of both rats, and exp( k ) can thus be interpreted as the USV rate of one rat in the activity state Stopping. This extension is used in the case study below.

From counts to binary data
We record USVs as a binary variable per time interval (video frame): A 1 A 2 for a time interval is either 1 or 0, depending on whether USV is detected or not within the time interval. However, the approach we have taken so far is based on counts observed in intervals: It estimates a USV rate assuming that A 1 A 2 follows a Poisson distribution with a USV rate A 1 A 2 per time interval and an interval length F A 1 A 2 . The problem with the count approach is that it theoretically assumes that multiple USVs could occur within one time interval. This problem can be overcome by modeling the binary response per time interval using a binomial distribution with a complementary log-log-link function, which can be derived naturally from a truncated Poisson distribution (McCullagh & Nelder, 1989). For ease of notation, we drop the indices A 1 A 2 for the moment. In short, the probability that y = 1 in a time interval is equal to the probability of detecting at least one USV, that is, the probability that the number of vocalizations (N) is not equal to zero. As N is Poisson distributed, where is the vocalization rate, and t is the duration of the time interval. In a GLM model for binary data (binary GLM model), this equation results in the complementary log-log-link function (H(x) = log(− log(1 − x))), rather than the customary logistic function (McCullagh & Nelder, 1989).
We note for our model that the vocalization rate in Equation 5 is the total rate, which is the sum of the rates of the two individual rats (Equation 3). The resulting model can be expressed as a CLM model with two link functions, an inverse log-link that relates predictors to the rates and an inverse-complementary log-log-link function (Equation 5) that links the total rate to the binary observation as follows: Binary sum-of-rates CLM: where . This "sum-of-rates" model for binary data, briefly referred to as the binary CLM model, is a special case of the bilinear CLM of Thompson and Baker (1981).

Material and methods
We simulate an experiment with one cage of two rats according to the binary version of the "sum-of-rates" model as in Section 2.4. For each rat, activity data were generated as a Markov process with the three activity states as states using an input transition matrix based on the case study. USVs were generated per rat, using a Poisson process with a length of one time interval, and thereafter combined, yielding a "0" if no USVs were detected in the time interval and a "1" otherwise (the data are thus truncated at 1). The input USV rates for the Poisson process per activity state are 0.01, 0.14, and 0.04 for Stopping, Lingering, and Progressing, respectively. These USV rates were loosely based on the real data, namely, the Pair Housing group of the case study (Section 4). Every data set was analyzed using a Combined Activity GLM, a count and a binary GLM approach, and a count data and a binary CLM approach. The simulation study was repeated using 10 times higher input USV rates.
Performance of the five models was quantified in terms of deviation of the model predictions from the expected values based on the true underlying data distribution. More specifically, we calculated the Poisson deviance (PD) of the model predictions as follows: where A 1 A 2 is the expected number of time intervals with USVs based on the true underlying data distribution, and g A 1 A 2 is the model prediction of a number of USVs. We also calculated the PD of the observed data. Source code to reproduce the results is available as an online appendix (Simulation 1).

Results
The Poisson deviances of the CLM models were lower than those of the corresponding GLM models, whereas the Poisson deviance of the Combined Activity GLM was in between (Figure 1, top row). With 10 times higher input USV rates, the binary models performed much better than the count models ( Figure 1, bottom row). This was expected, as with higher USV rates, multiple USVs per time interval become more frequent than with lower USV rates, and the binary approach takes account of the discrepancy between the number of USVs (as produced by the rats) and the number of time intervals with USVs (as in the data set), and the number of USVs. The simulation results also confirm that the activity state coefficients of the CLM models can be directly interpreted as USV rate per rat. These estimates are close to the input USV rates. Over all replications, all activity states, and both scenarios, the maximum observed squared error of a USV rate estimate of the binomial CLM model was less than 0.001.

Experimental design and analysis
We observed 16 pairs of rats in automated home cages (PhenoTyper ® , Noldus, Wageningen, The Netherlands) equipped with a Sonotrack ® recording system (Metris, Hoofddorp, The Netherlands). In the first eight home cages, the two rats were separated by a wall in the middle, which did not allow interaction (Individual Housing). In the second eight home cages, the two rats were housed together and could interact (Pair Housing). For each home cage, we recorded whether or not a USV was observed and also the activity of each of the two rats (Stopping, Lingering, or Progressing) for a period of 30 min at a resolution of 25 video frames per second.
The number of observed USVs differed between the cages and was higher for Pair Housing than for Individual Housing. The number of USVs ranged from 5 to 368 for Individual Housing and from 164 to 728 for Pair Housing (Table 3). The observed vocalization rates tended to be larger when the rats were more active, and were highest when both rats were progressing. The activity  Figure 2). Data were analyzed using the following four candidate binomial CLM models: Intercept-only The USV rate per time interval differs between the cages and is independent of activity of the rats in that cage.
where i is the USV rate per time interval for each of the K cages (k = 1, … ,K), and k is the intercept term per cage.

Current Activity
The USV rate per time interval differs between the cages and is dependent on the activity of rat 1 and activity of rat 2 in that frame.
where kA 1 A 2 is the USV rate per time interval for each of the K cages (k = 1, … ,K) given the activity of rat 1 (A 1 = S, L, or P) and rat 2 (A 2 = S, L, or P); k is the intercept term per cage; L and P are the regression coefficients for Lingering and Progressing, respectively; and X L 1 , X L 2 , X P 1 , and X P 2 are indicator variables that indicate whether or not rat 1 (X A 1 ) and rat 2 (X A 2 ) are lingering (X L n ) or progressing (X P n ). Note that, as is customary for regression models with nominal variables (factors), the intercept per cage here is the expected USV rate per time interval when the rat is Stopping (i.e., the baseline USV rate), and the regression coefficients for Lingering measure the difference in log-vocalization rate between Lingering and Stopping and similarly for Progressing. Past Activity The USV rate per time interval differs between the cages and is dependent on the activity of rat 1 and activity of rat 2 in that time interval and the 24 time intervals before (1 s). The model equation is the same as Equation 9 with a different definition of all parameters except k , L , and P . In the Past Activity model, if is the USV rate per time interval for each of the K cages (i = 1, … ,K) and F-24 time intervals ( f = 25, … ,F) given the activity of rat 1 and rat 2 in the second before that time interval, and X L 1 , X L 2 , X P 1 , and X P 2 are the proportion of time interval in the previous second rat 1 (X A 1 ) and rat 2 (X A 2 ) were lingering (X L n ) and progressing (X P n ). The first 24 time intervals in the data set cannot be used because we do not have recorded activity in the second before. Averaged Past Activity The USV rate per time interval differs between the cages and is dependent on the combined activity of both rats in the second (25 frames) before that time interval. The model equation is the same as Equation 9 with a different interpretation of all parameters except k , L , and P . In the Averaged Past Activity model, if is the USV rate per time interval for each of the K cages (i = 1, … ,K) and F-24 time intervals ( f = 25, … ,F) given the averaged activity of both rats in the second (25 video frames) before that time interval; X L 1 equals X L 2 , and X P 1 equals X P 2 which are the proportion of time interval in the previous second in which rat 1 and rat 2 were lingering and progressing. The model equation can be rewritten as a GLM model where * k = k + log(2) (as in Equation 4). The best candidate model for the data set is chosen based on Akaike information criterion (AIC; the lower, the better). The Intercept-Only model had the highest AIC for both the Individual Housing and the Pair Housing data set, which confirms the hypothesized relation between activity and vocalization behavior ( Table 4). The Individual Housing and Pair Housing data set have a different best candidate model. In the Individual Housing data, the best candidate model was the Current Activity model. This model is similar to the one used in the simulation study and is

FIGURE 3 Differences (line ranges) between the average predicted probability of vocalization between time
intervals with an observed ultrasonic vocalization (colored marker) and without an observed ultrasonic vocalization (black marker). Ranges are given per cage for the Intercept-Only (purple), Current Activity (blue), Past Activity (pink), and Averaged Past Activity (green) models based on the mechanistic model. The AIC differences between the Current Activity model and the (Averaged) Past Activity model were between 6 and 8, which is indicative of a trend. For the Pair Housing data, in which interaction between rats was possible, the Current Activity model was outperformed by the Averaged Past Activity Model (ΔAIC = 404). The addition of the potential to interact thus resulted in vocalization behavior for which the mechanistic model as proposed in Section 2.4 is no longer valid. In the presence of interaction, the USVs are better predicted from the joined activity of the pair than from the activity of the individuals. Because the Averaged Past Activity Model has equal linear predictors for both rats in the pair, it has an equivalent GLM model (see Equation 4).
In addition to the abstract quantification of model quality obtained from AICs, we provide a more tangible measure for models comparison. We compare the average predicted probabilities to vocalize for time intervals with and without an observed USV for each cage (Figure 3) and interpret the difference between these probabilities as indicative for the quality of model predictions (higher difference, better predictions). The estimates of the Intercept-Only model per cage are equal to the proportion of time intervals with an observed USV. For the other models, the differences in predicted vocalization probabilities within models were larger for Pair Housing than for Individual Housing (relative to the differences between cages). The differences between models within pairs were also larger for Pair Housing than for Individual Housing; the Averaged Past Activity model was clearly superior in the Pair Housing data, whereas the Current Activity model was not clearly superior in the Individual Housing data.
The data and source code to reproduce the results are available as online supporting information (within Simulation 2).

DISCUSSION
In this paper, we presented two approaches to relate the USV rate per cage to the activities of two rats, which we evaluated in a small simulation study. In the CLM approach, like in the mechanistic model, the activity of a rat predicts its USV rate, and the sum of USV rates of both rats gives the USV rate per cage ("sum-of-rates"). In contrast, in the GLM approach, the combined activity of both rats predicts the USV rate per cage. On data simulated using the mechanistic model, the Count and Binary CLM models were better able to predict the USV rate per cage than the Count and Binary GLM. In our design, we recorded which time intervals contained a USV rather than recording the USVs itself, which explains why the Binary models predicted the USV rates better than their Count counterparts. Overdispersion was not an issue, as the USV counts were presumably near binary, and binary data cannot have overdispersion.
Application of the CLM approach in a case study on the data of rats that could not interact (the Individual Housing data) resulted in the "sum-of-rates" model (Current Activity) as the best model in terms of AIC. When analyzing data from rats that could interact (Pair Housing data), however, we found that the "sum-of-rates" model did not provide the best model in terms of AIC. Instead, a CLM model with the averaged activity of both rats as predictor (Averaged Past Activity, which has an equivalent GLM model) fitted better.
The AIC is not a formal test but rather a guideline for model selection. Models with an AIC difference lower than 2 can be considered equivalent, whereas a difference over 10 implies that they are "definitely different" (Bolker, 2007). The AIC differences for Pair Housing were all over 100 and thus different. The AIC differences for Individual Housing among the Current Activity, Past Activity, and Averaged Past Activity model were between 2 and 9, which is merely indicative of a trend. In terms of model predictions (Figure 3), the differences between models were also apparent for Pair Housing and nonpersuasive for Individual Housing. For the Pair Housing data, the AIC differences rule out the Current Activity model. In contrast, for the Individual Housing data, the Current Activity model is the best model in terms of AIC, and the Averaged Past Activity model is a close runner-up. From a biological perspective, however, because the rats cannot interact, it seems highly unlikely that rats vocalize based on the behavior of the other. We verified that we have enough power in our analysis to select the "true" model via a second simulation study. In this simulation study, we alternately assume that one of the four candidate models is true and generate data accordingly. All generated data were analyzed using all four candidate models. In almost all instances, the candidate model with the lowest AIC in the simulation was the candidate model used to generate the data and vice versa (more details and results are in Appendix C). From this simulation study, we can thus conclude that it is highly unlikely that data with an underlying model structure from one of the candidate models result in another best candidate model.
We posed a mechanistic model for the vocalization rate observed in a cage in relation to the behavior of the individual rats. This model led to a CLM (Equation 3). We thus went from a mechanistic model to a statistical model. However, the mechanistic model is not the only one leading to this specific statistical model. Another mechanistic model would be that rat 1 vocalizes only in response to the behavior of rat 2 and vice versa. This leads also to the statistical model of Equation 3. For the full set of models, see Appendix A.
Many more models than those described here could be fitted to the data. Given that the USV rate varied widely between pairs, a mixed modeling approach could be an interesting future extension of our work. For mixed modeling approaches for CLM, see Ayma, Durbán, Lee, and Eilers (2016) and Douma (2017).
In spite of our limited selection of models, we can still infer that rats that can and rats that cannot interact show a different relation between USVs and activity. More specifically, the USV rates estimated from individual behavior of rats gives the best predictor when rats are housed individually, but when rats are housed in pairs, such an individualistic model no longer gives the best predictor, and a predictor based on the combination of the rats' behavior performs better.
The USV rate of a pair of interacting rats is thus shown to be different from the sum of its parts. We conclude from this experiment that social interaction between rats changes the relation between activity and USV rate of the rats.
With this application, we have once more demonstrated the utility of the CLM approach. A small R library implementing this approach (PBCLM-package) and the R code to replicate both simulation studies are available as online supporting information. Thompson, R., & Baker, R. J. (1981). Composite link function in generalized linear models. Applied Statistics, 30 (2)

ALTERNATIVE MECHANISTIC INTERPRETATION OF THE STATISTICAL MODEL
We posed a mechanistic model for the USV rate observed in a cage in relation to the behavior of the individual rats. This model led to a statistical model: a composite link model (Equation 3 in the main text). A valid question is whether the mechanistic model is the only one leading to this statistical model. In this section, we show that it is not.
In this paper, we have assumed a mechanistic model in which the two rats in the same cage have a USV rate depending on their activity state, which sums up to the USV rate of the cage. In this model, we can interpret exp( A 1 ) as the USV rate of the first rat and exp( A 2 ) as the USV rate of the second rat. This is equivalent to saying that exp( A 1 ) is the USV rate of the cage conditional on the activity of the first rat and that exp( A 2 ) is the USV rate of the cage conditional on the activity of the second rat.
However, this is not the only possible interpretation for the parameters from the statistical model. Now, let us assume a mechanistic model in which the USVs of rats are related not only to their own activity but also to the activity of the other rat in the cage. In the most extreme case, rat 1 would vocalize solely conditional on the activity of rat 2 and vice versa. Because we have no way of detecting which rat uttered which USV, this assumption would lead to exactly the same statistical model and exactly the same estimated model parameters. The full collection of models can be written as follows: where p and q range from 1 when USVs are solely related to a rat's own behavior to 0 when USVs of a rat are solely related to the other rat's behavior (Table A1). The parameters p and q are inestimable when the observed USVs cannot be allocated to one of the rats. Now, the USV rates of the cage, conditional on the activity of the first or second rat can still be estimated by exp( A 1 ) and exp( A 2 ), respectively. However, the USV rates of rat 1 and rat 2 are given by exp( p A 1 + (1 − q) A 1 ) and exp((1 − p) A 2 + q A 2 ), respectively, which we cannot estimate.
The conclusions that we draw in the main text are based on the contrast of models that differ in their relation between activity and vocalization for rats that can and cannot interact. This conclusion is not dependent on the precise mechanistic interpretation of the CLM model.

CLM MODEL
The concept of the CLM ( Thompson & Baker, 1981) is easiest to grasp in matrix notation. In this Appendix, we first write the Count GLM model in CLM notation and thereafter the "sum-of-rates" model. For brevity, we will show matrices as if there exist only two activity states (S and L). In that case, the model matrices for the Count GLM model are as follows: .
Here, C is the identity matrix and thus Ch(X ) = h(X ), which is equivalent to Equation 2 (in the main text). The link matrix allows us to link multiple observations to one linear predictor. In order to obtain the "sum-of-rates" model of Equation 3, we set matrices X and C to the following: which is equivalent to Equation 3 (in the main text). For clarity, the separation between the observations is visualized using white space in the matrices.

APPENDIX C SIMULATION STUDY FOR AICs
In the case study, we observed that out of four candidate models, the Current Activity model had the lowest AIC on the Individual Housing data, and the Average Past Activity model had the lowest AIC on the Pair Housing data. In this simulation study, we show that it is highly unlikely that, assuming that the true model is among the candidate models, the same model holds for the Individual Housing and Pair Housing data, and we show that AIC is a suitable method for model selection in this type of data. Source code to reproduce the results is available as online supporting information (Simulation 2).

TABLE C1
For each of the eight simulation scenarios (two experimental data sets × four candidate models) and for each of the four candidate models used for analysis, the proportion of 1,000 simulations in which each candidate model has the lowest Akaike information criterion Candidate model used for analysis Simulation data set IC-Only Cur. Act. Past Act. Ave. Past Act.
In our simulation study, for every combination of the four candidate models and the two experimental data sets (Individual Housing and Pair Housing), we (i) fit the candidate model to the experimental data set to obtain an Input Model, (ii) predict the expected USV rate for every frame in the data set using the model predictions from the Input Model, (iii) generate 1,000 simulation data sets using the expected USV rates from the previous step, (iv) fit all four candidate models to the 1,000 generated simulation data sets, and (v) compare the AICs of the four candidate models used for analysis and determine which candidate model has the lowest AIC, for every of the simulated data sets.
Note that a candidate model here indicates the structure of the statistical model, not the size of coefficients. When comparing AICs of the candidate models, we always use the four AICs calculated in step (iv) of the procedure. The correct candidate model is thus the candidate model that has the same structure as the Input Model, not the exact same model.
The results of the simulation study show that AIC is a reliable tool for selecting the correct candidate model. In the vast majority of 8,000 simulation data sets, the correct candidate model had the lowest AIC. This is more often true in the data based on the Pair Housing (95.6%) than in the Individual Housing data (81.2%), and it depends on which was the correct candidate model (Table C1). In the simulation data based on the Pair Housing case study, the correct candidate model always had the lowest AIC except when the correct candidate model was the Intercept-Only model. In the simulation data based on the Individual Housing case study, the Current Activity model always had the lowest AIC when it was correct.
Differences in AIC between the correct candidate model and the other three models ( Figure C1) were larger in the Pair Housing than the Individual Housing set and differed between the different candidate models used to estimate the Input model. The order of magnitude of the AIC differences when the correct candidate model had the lowest AIC was the same as the AIC differences observed in the case study. The maximum AIC difference observed when the wrong candidate model was identified was 13.3, which is well below the observed AIC differences for the case study.
Secondly, we use the simulation to show that it is unlikely that the two data sets from the case study have the same underlying model, given that a different candidate model had the lowest AIC when analyzing this data (Current Activity for Individual Housing vs. Average Past Activity for Pair Housing). For this aim, we determine how likely it is that the candidate model with the lowest AIC is in fact the correct candidate model. If it is very likely that the candidate model with the lowest AIC is the correct candidate model, and the candidate models with the lowest AIC are not the same for both data sets, it is unlikely that the underlying "true" model is the same.
In the simulation study, in the Individual Housing data sets, when the AIC of the Current Activity model was the lowest, that candidate model was the true candidate model in 91% of simulation data sets (Table C2). The Current Activity model never had the lowest AIC when the Average Past Activity model was the true candidate model. In the Pair Housing simulation data sets, when the AIC of the Average Previous Second model is the lowest, that candidate model was the true candidate model in 96% of simulation data sets. The Average Past Activity model never had the lowest AIC when the Current Activity model was the true candidate model. It is thus both unlikely that in the case study, the true candidate model of the Individual Housing data set was the Average Past Activity model and that in the Pair Housing data set, the true candidate model was the Current Activity model.

FIGURE C1
Results of the simulation study for Akaike information criterion (AIC). Each panel represents results on 1,000 simulation data sets based on the Individual Housing (left column) or Pair Housing (right column) experiment, using each of the four candidate models (rows) to generate input data (i.e., rows represent the true models). Violin plots indicate the distribution of the difference in AIC between the Intercept-Only model (purple reference line) and the candidate model