Experiments 1–3
On most of the longdelay trials, each subject poked first into the shortdelay station and then switched to the longdelay station before the long temporal interval elapsed, indicating a realtime classification of the elapsing duration, as first reported by Platt and Davis (1983) in the pigeon. Figure 2 (left panel) shows the cumulative distribution of departures from the shortdelay station and arrivals at the longdelay station (on longdelay trials only) for some representative subjects from Experiments 2 and 3. The solid lines represent the cumulative distributions of departures (switch latencies), while the dashed lines represent the cumulative distributions of arrivals. The departure was defined as the latency (from trial onset) of the offset of the last poke in the short station prior to the onset of the first poke in the long station. The latency of this latter poke onset was defined as the arrival latency.
The median switch latency is generally close to the geometric mean of the two delays, as is the socalled point of subjective equality in the bisection procedure when run with nonhuman animal subjects (Church & Deluty 1977; Fetterman & Killeen 1995; Platt & Davis 1983). The mean normalized median and the mean normalized interquartile intervals of switch latencies are presented in Table 2.
Table 2. Mean normalized median and IQI from Experiments 1–4  Mean normalized median ± SE  Mean normalized IQI ± SE 


Exp. 1: 2 vs. 6 seconds  1.02 ± 0.02  0.44 ± 0.02 
Exp. 2 
3 vs. 9 seconds  0.94 ± 0.04  0.33 ± 0.03 
6 vs. 18 seconds  0.78 ± 0.06  0.49 ± 0.03 
12 vs. 36 seconds  0.89 ± 0.02  0.58 ± 0.05 
Exp. 3 
2 vs. 6 seconds  0.82 ± 0.01  0.21 ± 0.01 
12 vs. 36 seconds  1.04 ± 0.03  0.37 ± 0.04 
10 vs. 60 seconds  0.83 ± 0.05  0.46 ± 0.03 
Exp. 4 
3 vs. 9 seconds 
KO  1 ± 0.02  0.66 ± 0.08 
WT  0.98 ± 0.04  0.66 ± 0.15 
9 vs. 27 seconds 
KO  0.87 ± 0.02  0.27 ± 0.03 
WT  0.87 ± 0.06  0.5 ± 0.13 
Platt and Davis (1983) tested pigeons with durations ranging between 40 and 200 seconds. This range of durations resulted in a large number of multiple switches. We used much shorter durations (ranging between 2 and 60 seconds, with most of the durations being closer to the lower end). At these durations, multiple switches were less common. In Experiment 2, multiple switches occurred in 13–19% of 36second trials, only 1% of 12second trials (in just two subjects) and only 2–4% of 18second trials (in just two subjects). In Experiment 3, multiple switches occurred in 5%, 7%, 8%, 12% and 23% of 36second trials, depending on the subject, and in 10%, 15%, 19%, 27%, 40% and 48% of 60second trials, again depending on the subject. We ran analyses that took into account these multiple switches in the estimation of accuracy and precision. These analyses gave results so similar to the results based simply on the cumulative distributions of first departures and first arrival that we do not report them.
Switch latencies are approximately scale invariant: when the time axis is normalized, the plots look similar over a wide range of delay pairs (given at upper left of each panel). Scale invariance is equivalent to Weber’s law; it implies that where the median lies in relation to the two delays, which is our measure of accuracy, the ratio of the interquartile interval to the median, which is our measure of precision, is independent of the absolute values of the delay pairs. In Experiment 2, each of the four subjects was tested with three different pairs of delays. The delays differed between the pairs by a factor of as much as 6, but the withinpair ratio of the short to the long delay was kept at 1:3. If Weber’s law (the scale invariance of discriminability) holds, then the psychometric functions for different duration pairs should be the same on a normalized scale. It is evident from visual inspection that this property holds, at least to a first approximation.
We did a relative likelihood analysis (Glover & Dixon 2004; Kass & Raferty 1995) to test the withinsubject scale invariance of the distributions. In this analysis, we fit the normalized switch latencies with Weibull distributions, either separately for each phase (see left four plots in Fig. 3) or after pooling the data from all three phases (see top right plot in Fig. 3). In the former case, we summarize or represent the data with a sixparameter model, a model that assumes that the data from each phase come from a different distribution. In the latter case, we summarize or represent the data with a twoparameter model, assuming that they all come from the same distribution. In computing each fit, we get the maximum log likelihood for the data represented by that fit (how likely those data are, given that model of them). The overall maximum log likelihood for the threedistribution model is the sum of the individual maximum log likelihoods. The maximum likelihood for the threedistribution model may be compared with the maximum likelihood for the onedistribution model, using the Schwarz criterion (Schwarz 1978) to adjust for the difference in the numbers of parameters:
Odds ratio (Model1 : Model2) = exp[ML_{1}− ML_{2}– 1/2 (d_{1}–d_{2})log(n)], where ML_{1} and ML_{2} are the respective maximum log likelihoods and (d_{1}–d_{2}) is the difference (=4) in the number of parameters, and n is the number of data points (switches). The Odds ratio is also called the Bayes factor; it is the relative likelihood of the different models visavis the data. It quantifies how much better one or the other model is at representing the data.
In these analyses we considered data only from sessions 10–19, 2–10 and 6–21 for Phases 1, 2 and 3, respectively. For Subjects 2 and 4, the single distribution (i.e. the scaleinvariant model) was more likely than the multidistribution model, whereas for the two other subjects, the reverse was true. Thus, scalar variability holds more or less exactly within some subjects, whereas, with the extreme ranges we tested, there are some modest violations of it in some other subjects.
We compared the medians and interquartile intervals across different phases. The tendency is for the distributions obtained using shorter duration pairs (3 and 9 seconds) to be steeper (have narrower spread) than the distributions with the longer duration pairs (6 and 18 seconds and 12 and 36 seconds). The mean normalized medians (medians divided by the geometric mean of short and long durations) did not differ, but the mean normalized spreads (interquartile intervals divided by the geometric mean of short and long durations) differed across different duration pairs [F(2, 6) = 6.82, P = 0.03, repeatedmeasures anova]. Similar violations of scalar variability are seen with the peak procedure when subjects are tested over a similarly large range of intervals (Gallistel et al. 2004b; E. A. Ludvig, F. Balci & K. M. Longpre, in review).
In addition to the analysis reported above, we also applied the likelihood analysis for each phase collapsing data across four subjects and the pooled data collapsing data across four subjects and three phases (see the bottom right plot in Fig. 3). For data averaged in this manner, the pooled model was a somewhat better model than the unpooled model; the Bayes factor was 8:1 in favor of the model that assumes no effect of a change in time scale.
Importantly, these measures of timing accuracy and precision are very similar not only across a wide range of intervals but also from one normal (wildtype) subject to the next, as is evident from the just given small standard errors for the mean normalized median and the mean normalized spreads (interquartile intervals) – see Table 2. This is in marked contrast to commonly used measures of strength or latency or rate of performance. Betweensubject differences of a factor of 10 or more are common in such measures.
In tasks intended to be used for screening large numbers of genetically manipulated subjects, efficiency is an important consideration: how much training is required? In the right panel of Fig. 2, we plot session by session the cumulative number of switches (dashed stair plots against right ordinates) and, on the same panels, the quartiles (25%, 50% and 75% points) of the cumulative distribution of those switches (solid plots against left ordinates). The plots of the quartiles begin only with the session in which the cumulative number of switches became equal to or greater than 8. For most subjects, these sessionbysession plots cover only the first 10 sessions because a clear and stable distribution had developed in that span. The final quartiles (based on all the switches) and the cumulative number of switches across all sessions are given at the extreme right of each plot.
Switching emerged sooner in subjects trained with the longer intervals (12 and 36 seconds or 10 and 60 seconds) rather than the shortest (2 and 6 seconds). In Experiment 3, the six subjects trained with the longer intervals had all made 30 or more switches by the end of the seventh session, while it took between 9 and 15 sessions for the subjects trained with the shortest intervals to reach that point. When naive mice are trained with these longer intervals (and sometimes even when trained with the shorter intervals; see Experiment 2, Subjects 1 and 4), they begin to switch already in the second or third session, and the quartiles of the cumulative switch distribution approximate their final values within six sessions. When subjects were switched to a new pair of intervals, they adjusted within a single session, provided the change was by a factor of no more than 2, in which case the quartiles of the new cumulative switch function approximated their final values after only three sessions (see Phase 2 of Experiment 2, right panels in Fig. 2).