Sleep does not aid the generalisation of binocular disparity‐based learning to the other visual hemifield

Visual perceptual learning refers to long‐lasting performance improvements on a visual skill – an ability supported by plastic changes in early visual brain areas. Visual perceptual learning has been shown to be induced by training and to benefit from consolidation during sleep, presumably via the reactivation of learning‐associated neuronal firing patterns. However, previous studies have almost exclusively relied on a single paradigm, the texture discrimination task, on which performance improvements may rely on higher‐order rather than lower‐level perceptual skills. In the present study, we tested whether sleep has beneficial effects on a visual disparity discrimination task. We confirm previous findings in showing that the ability to discriminate different disparities is unaffected by sleep during a 12‐hr retention period after training. Importantly, we extend these results by providing evidence against an effect of sleep on the generalisation of improved disparity discrimination across the vertical meridian. By relying on a between‐subject design, we further exclude carry‐over effects as a possible confound present in previous findings. These data argue against sleep as an important factor in the consolidation of a low‐level perceptual skill. This sets important constraints on models of the role of sleep and sleep‐associated neural reactivation in the consolidation of non‐declarative memories.


| INTRODUC TI ON
Recently acquired information is often susceptible to distortion and loss. A subset of novel memories must thus undergo an active process of consolidation (Squire et al., 1984). This process, often referred to as 'systems consolidation', occurs mainly during post-learning sleep (Klinzing et al., 2019). Successful consolidation reduces the probability of information to be forgotten. It also induces qualitative changes to the information, often including improved generalisation. For visual learning, this may encompass a higher invariance to specific properties of a stimulus, such as its location in the visual field.
Visual learning is commonly investigated using the classical texture discrimination task (TDT). Improvements on this task tend to be confined to the trained eye or region of the visual field. This suggests that visual learning is supported by synaptic changes in the early stages of processing, where information is retinotopically organised (Karni & Sagi, 1991). This notion is supported by blood-oxygen-level-dependent responses to the TDT, which are higher in the visual cortex when stimuli are presented to the trained compared to the untrained eye (Schwartz et al., 2002). A series of studies showed benefits of sleep on TDT improvements (Deliens et al., 2014;Gais et al., 2000;Karni et al., 1994;Stickgold et al., 2000;Tamaki et al., 2020), including the ability to generalise to the untrained eye (Deliens et al., 2014). Due to the retinotopic organisation of early visual areas, local mechanisms are unlikely to account for this effect, suggesting a role of top-down feedback during consolidation. Coordinated memory reactivation between visual cortex and hippocampus (Ji & Wilson, 2007), as well as thalamus (Durkin et al., 2017), has been shown in sleeping animals, potentially reflecting such a consolidation mechanism.
As a major caveat to this idea, sleep benefits on visual perceptual learning have almost exclusively been demonstrated on the TDT. Importantly, difficulty in the TDT is adjusted by changing the time between the stimulus and a subsequent mask, not by changing properties of the stimulus itself. Improvements on the task may therefore result from better temporal allocation of attentional resources. Consistent with this idea, training subjects on temporal aspects of the task beforehand has been shown to minimise further improvements (Wang et al., 2013) and to improve generalisation (Xiong et al., 2016). This suggests that learning and consolidation in the TDT may involve higher levels of visual processing than is commonly assumed.
To specifically investigate perceptual learning and its generalisation at the lowest levels of the visual system, we employed a task that trains the ability to assess depth by using disparities in the images received by both eyes. First signs for processing of binocular disparity are found in the primary visual cortex (DeAngelis, 2000).
We define perceptual learning as retinotopically local performance improvements following repeated exposure to this low-level task.
We define generalisation as performance improvements at untrained retinotopic locations. Note that this view of generalisation is restricted in that it does not account for the possibility that learning may generalise to new retinotopic locations, while remaining specific to the trained location in external space. Furthermore, we assessed generalisation in terms of immediate post-consolidation improvements in performance at the untrained location. However, generalisation may also be reflected in faster performance gains at that location. As is the case for the seminal studies described above, our protocol does not allow distinguishing between these different forms of generalisation.
In a previous study using the same task, we did not find evidence of sleep-associated retinotopically local improvements or their generalisation across the horizontal meridian of the visual field (Klinzing et al., 2020). However, our previous results might have been confounded by differences in baseline performance between the lower and upper hemifield (Skrandies, 1987), such that worse performance in the upper visual field could have masked generalisation. To exclude such confounding, we tested in the present study generalisation across the vertical instead of the horizontal meridian, i.e. from the left to the right visual field or vice versa. Furthermore, we addressed potential carry-over effects between wake and sleep conditions by testing this factor across groups instead of within subjects.
Extending the results from our previous study, we present evidence against a generalisation of binocular disparity learning across the vertical meridian. While this does not exclude the existence of local generalisation to close-by receptive fields, our data suggest that early perceptual learning devoid of temporal cues is neither subject to long-range generalisation by top-down feedback, nor does it depend on sleep.

| Participants
A total of 32 participants (17 female, 15 male; mean [± SD, range] age 23. 66 [3.25, 18-30] years) were trained on a binocular disparity discrimination task used previously (Klinzing et al., 2020). Nine additional subjects started the experiment but were excluded because they were not able to see the depth effect and thus performed at around chance level (n = 7), slept during the retention interval despite being assigned to the wake group (n = 1), or as a result of technical difficulties (n = 1). Subjects were required to score >1.0 in a decimal visual acuity test, corresponding to a Snellen acuity of 20/20. Exclusion criteria were health problems, ongoing medication, medical interventions, night or shift work, examination periods, stress-intense occupations during the 3 weeks prior to the experiment, or a history of psychiatric, neurological, or sleep disorders. On experimental days, extensive physical exercise, daytime naps, as well as the consumption of alcohol, caffeine, or illegal drugs were prohibited. The study was conducted in accordance with the Declaration of Helsinki and approved by the local Ethics Committee of the Medical Faculty of the Universität Tübingen. All subjects gave their written informed consent.

| Binocular disparity discrimination task
Participants performed a two-choice disparity discrimination task in the lower visual field (Figure 1a). They discriminated whether a central disc (3° diameter) was protruding ("near") or receding ("far") relative to a ring (1° wide) surrounding the disk. Trials were performed at different difficulty levels, varied by the disc's signal strength (see below). Between trials, only a fixation cross was shown at the centre of the screen. After initiating a trial with a button press, the stimulus (disc and ring) was shown for 1.5 s with a horizontal (left or right depending on condition) and vertical offset (lower visual field) to the fixation cross of 3°. The stimulus and fixation cross then disappeared and were replaced for a maximum of 2 s by two-choice targets (small versions of a near and a far stimulus at full signal strength; 2.2° disc diameter; 0.8° ring width) above and below the previous location of the fixation cross. Participants were asked to pick the stimulus they had just seen using an up or down button press (match-to-sample). Eye fixation was assessed using an infrared eye tracker during the interval between stimulus onset and choice target onset (Eyelink 1000, SR Research) at a sampling rate of 500 Hz. Stimuli were shown on a back-projection screen using two projectors (Projection Design F21 DLP, 60 Hz, 1,920 × 1,080 pixels, 70.5 cm image width, 225 cd/m 2 mean luminance, linearised, 80 cm viewing distance) and passive linear polarising filters with a relative tilt of 90°. Participants wore passive linear polarising filter glasses, allowing us to present slightly different images to each eye. The disparity between the two images resulted in the perception of depth, which we used to adjust the relative distance of the central disc to the outer ring.
The disc and ring were assembled from circular dynamic random dot stereograms, consisting of equal numbers of black and white dots. The outer ring was always shown at 0° disparity. For each video frame (shown for 1/60 s), all dots belonging to the central disc had identical disparities. Target disparities (±0.1° for "near" and "far") were well above the detection threshold and thus easily detectable in the absence of noise. Task difficulty was manipulated by varying the proportion of video frames showing dots at the target disparity (% signal strength). For the remaining video frames, disparity values were randomly drawn from a uniform distribution of 11 values in 0.05° increments from −0.25° to 0.25°. For 0% signal trials, disparity values for all video frames were drawn from the noise distribution and random feedback was given. We ran the task using custom scripts (cf. Seillier et al., 2017) using MATLAB 2014a (MathWorks) and Psychophysics Toolbox 3 (Kleiner et al., 2007;Peli, 1997).
Performance was modelled using cumulative Gaussian functions. The resulting psychometric curves describe the ratio of "far" answers (represented on one axis) for each of the sampled signal strengths (represented on the other axis). At one extreme, trials with a full-strength "near" signal usually lead to almost no "far" answers.
At the other extreme, trials with a full-strength "far" signal lead to almost exclusive "far" answers. For these and all signal values in between, the ratio of "far" answers can be adequately captured by a sigmoid curve. From this psychometric function, three parameters were derived: the Slope of the function is the main metric for perceptual performance. Assessed at the signal strength leading to 50% correct responses, it reflects how quickly performance improves with higher (and declines with lower) signal strength. A related measure is the psychometric Threshold, which is defined as the signal strength required for 85% performance (here we use the average of the unsigned values for near and far trials). The third parameter we analysed is the observer's Bias. On trials where no signal is given, an unbiased observer would make the same amount of "far" and "near" decisions. For real observers, this equilibrium is usually located at a signal strength that deviates from 0. This signal strength is considered the observer's Bias.

| Experimental timeline
All subjects participated in an Acquisition and a Testing session, which were scheduled 12 hr apart and consisted of three phases F I G U R E 1 Learning task and experimental design. (a) Binocular disparity cues had to be exploited to differentiate two stimulus types ("near" versus "far"). (b) All phases of the task were performed on one side of the lower visual field, except for the Generalisation test, which was performed on the other side (balanced design). (c) Each participant went through an Acquisition and Testing session with a total of six experimental phases. Acquisition (Warmup, Baseline and Training) and Testing (Refresher, Retrieval and Generalisation) were separated by a 13-hr retention interval, resulting in a 12-hr delay between Training and Retrieval. The wake group started Acquisition in the morning (08:30 hours), stayed awake during the retention interval, and was tested in the evening (21:30 hours). The sleep group started in the evening (20:30 hours), slept during retention, and completed the task in the morning (09:30 hours). The speaker symbol indicates whether tones signalled a correct and incorrect answer. This auditory feedback was given only during Acquisition. Phases used to assess performance are colour-coded and went through Testing in the morning. Participants in the wake group started Acquisition in the morning, stayed awake, and completed Testing in the evening. We verified that subjects in the wake group did not nap during retention using activity trackers with acceleration and light sensors (Actiwatch, Philips Respironics). The same devices were used to assess sleep duration in the sleep group.
Any sleep in the wake group or sleep for <6 hr in the sleep condition led to the exclusion of the subject (n = 1). Subjects showed an increase in sleepiness across a wake retention interval (Holmcorrected p = .011, d = 3.469) and no change across a sleep retention interval (p = 1.0; interaction Acquisition/Testing × Sleep, p = .011, η 2 = 0.079).

| Data analysis
We employed Bayesian inference (psignifit 4 toolbox for Matlab, https://github.com/wichm ann-lab/psign ifit) for estimating psychometric functions for Baseline, Training, Retrieval, and Generalisation by fitting a beta-binomial model to the correct responses at each signal strength (Schütt et al., 2016). Based on the fitted cumulative Gaussian functions (see "Binocular disparity discrimination task"), we analysed the parameters Slope, Threshold, and Bias. On average,

| Performance in the binocular disparity task improves locally
Task performance, as measured by the slope of the psychometric function, changed over the course of the experiment across both groups (mixed repeated measures ANOVA, main effect Phase, F 3,90 = 9.818, p < .001, η 2 = 0.092). Performance increased from Baseline to Retrieval, which was reflected by a progressive steepening of the slope (Figure 2, top)

F I G U R E 2
Sleep does not aid the transfer of improvements in binocular disparity to the other visual field. We compared the slope of the psychometric functions fitted to each participant's performance between experimental phases and across groups (higher values denote an improvement in performance; for other parameters, see main text). Top: Large light unfilled and dark filled circles show grand averages (±95% confidence intervals) for the wake and sleep group, respectively. Small circles show single-subject data. Bottom: Training progress, illustrated by splitting Training trials into three consecutive blocks of 260 trials each. Performance improved only descriptively from Baseline to Training (post hoct test, p = .152, Holm corrected), improved significantly between Training and Retrieval (p = .006) and deteriorated when Generalisation was tested in the other visual field (p = .008; ANOVA main effect Phase p < .001). Performance did not differ between wake and sleep groups (ANOVA main effect Wake/Sleep, p = .688; interaction Wake/Sleep × Phase, p = .505) or across Training blocks (p = .567). **p < .01; ***p < .001

| No signs for improved generalisation after sleep
So far, we have demonstrated that across groups, performance improved with task exposure and over the retention period but dropped to baseline levels when the stimulus was presented on the other side of the lower visual field. By now focussing on differences between groups, we analysed whether sleep affected performance, including the participants' ability to generalise their improvements to the other visual hemifield. In contrast to our previous study (Klinzing et al., 2020) To substantiate the notion that sleep does not influence training and generalisation success, we performed Bayesian ANOVAs on all parameters. As Bayesian inference is based on model comparison, this allowed us to quantify the empirical evidence in favour of the null hypothesis, something that is conceptually impossible using classical statistics. Analyses were conducted using JASP's default priors (r scale = .5 for fixed effects and 1.0 for random effects).
Given the highly significant main effects of Phase, we included this factor into the respective null models and compared them to full models that additionally contained the main effect of Wake/Sleep and its interaction with Phase. These analyses provided moderate to strong evidence in favour of the null models (Slope, BF 01 = 13.15; Threshold, BF 01 = 4.11; Bias, BF 01 = 23.10). Subsequent analyses of individual effects indicated mostly anecdotal evidence against a main effect of Wake/Sleep (Slope, BF excl = 2.35; Threshold, BF excl = 1.83; Bias, BF excl = 3.31), but more reliable evidence against the interaction of Wake/Sleep and Phase (Slope, BF excl = 5.46; Threshold, BF excl = 2.23; Bias, BF excl = 7.05). This pattern of results was even more pronounced when the first session of all 17 participants from our previous study (Klinzing et al., 2020) were included in the analysis (Slope, BF 01 = 19.08; Threshold, BF 01 = 11.73; Bias, BF 01 = 28.90), with moderate evidence against both the main effect of Wake/Sleep (Slope, BF excl = 3.15; Threshold, BF excl = 2.94; Bias, BF excl = 3.45) and its interaction with Phase (Slope, BF excl = 6.05; Threshold, BF excl = 3.99; Bias, BF excl = 8.37). Both classical and Bayesian statistics are thus compatible with a lack of an important effect of sleep on the generalisation of improvements in coarse disparity discrimination. This result has now proved consistent across two independent sets of participants (total n = 49).
All performance data and statistical analyses have been uploaded to an Open Science Framework repository and are publicly available: https://osf.io/zqkuw/ (doi: 10.17605/OSF.IO/ZQKUW).

| D ISCUSS I ON
The data reported in the present study confirm our previous findings based on the same stimuli and task (Klinzing et al., 2020). They show substantial improvements in coarse binocular disparity discrimination following a single training session. Our present results also corroborate that these learning effects are retinotopically local, i.e. they do not transfer between different visual hemifields. In our previous study, we investigated transfer from the lower to the upper hemifield (Klinzing et al., 2020). The lack of generalisation observed there could at least partly have been the result of anatomical and functional anisotropies between the upper and lower visual hemifields (Herde et al., 2020;Skrandies, 1987). In the present study, we addressed this potential confound by assessing transfer between horizontal hemifields and observed qualitatively and quantitatively similar effects. We again found that sleep during the retention interval did not affect subsequent performance, neither within the trained visual-field quadrant nor at the transfer location. These results suggest that perceptual learning of coarse disparity discrimination depends on retinotopically specific mechanisms that do not benefit from memory consolidation during subsequent sleep.
While performance increased significantly over the retention period, this increase did not differ between sleep and wakefulness, and improvements remained local in both experimental groups.
Thus, disparity discrimination learning does not seem to be amenable to transfer either via sleep-dependent extraction of higher-level regularities (Zhang et al., 2010), improved temporal allocation of attention (Wang et al., 2013), or other sleep-dependent top-down processes. Results may differ in paradigms in which the temporal structure within trials supports task performance or in which the task involves stimuli that are either static or change on a time scale comparable to natural scenes. It is also worth noting that we defined generalisation in terms of performance improvements that manifest immediately after switching the stimulus to a new location.
Alternatively, generalisation may also be seen in terms of enhanced learning potential at the new location (e.g. by steepening the learning curve). The present paradigm is not able to detect this form of generalisation if it does not affect overall performance within the 280 trials of that phase of the task.
We found that behavioural performance improved rapidly and plateaued early on during task execution. Such initial fast learning has also been observed in the original TDT (Karni & Sagi, 1993).
Later studies repeatedly showed performance limits or even deterioration of performance, especially during prolonged training (Censor & Sagi, 2009;Ofen et al., 2007). In our data, the significant increase in performance from baseline to retrieval can to a large extent be attributed to improvements that occurred over the retention interval. Such performance gains could either reflect memory consolidation or a release from training-induced adaptation (Censor et al., 2006;Mednick et al., 2005). Specifically, it has been argued that over-exposure to the stimulus during training may reduce performance within and between sessions on the same day.
Sleep may then be required to release this (over-)adapted network state. Based on our manipulation, we cannot assess the relative contributions of consolidation versus release of adaptation.
However, in the absence of sleep effects, the most parsimonious explanation for our data is that the passage of time alone is sufficient for adaptation to be released and/or perceptual memories to be consolidated following binocular disparity discrimination learning. The lack of an interaction between offline changes in performance and sleep may appear surprising given a long history of perceptual learning studies that have documented beneficial effects of sleep (Deliens et al., 2014;Gais et al., 2000;Karni et al., 1994;Stickgold et al., 2000;Tamaki et al., 2020). However, practically all demonstrations of sleep effects were obtained with a single experimental paradigm, the TDT (Karni & Sagi, 1991). The nature of this particular task differs substantially from our protocol, as well as from many other tasks classically used to investigate perceptual learning (Aberg et al., 2009;Le Dantec & Seitz, 2012).
Most importantly, difficulty in the classical TDT is manipulated, and learning is induced, by reducing the delay between target and mask stimuli. Recent evidence suggests that learning on the TDT largely relates to improved temporal learning (Wang et al., 2013) and that appropriate training procedures can induce transfer of texture discrimination learning. This could explain why a growing number of studies report substantial amounts of ocular and retinotopic transfer (Wang et al., 2012;Xiong et al., 2016;Zhang et al., 2010), in contrast to findings based on the classical paradigm (Karni & Sagi, 1991 The main rationale behind the present follow-up study was to account for the potentially confounding effects of numerous visualfield anisotropies (Abrams et al., 2012;Karim & Kojima, 2010), particularly those between the vertical hemifields (Herde et al., 2020;Previc et al., 1995;Rauss et al., 2009;Skrandies, 1987). For example, differences in spatial and temporal resolution between the upper and lower visual field could have masked high-level transfer of learning as well as sleep benefits in our previous study. Some retinotopic specialisations are even directly linked to disparity processing, such as preferences for crossed versus uncrossed disparities in different parts of the visual field (Grabowska, 1983;Previc et al., 1995). The present data suggest that these anisotropies do not affect the consolidation of binocular disparity discrimination learning. While there are also horizontal asymmetries in visual processing (Grabowska, 1983;Okubo & Nicholls, 2008), these are less likely to exert a substantial influence in the context of our stimuli and task (Breitmeyer et al., 1975;Julesz et al., 1976). Thus, we are confident that the present data, in combination with our previous results (Klinzing et al., 2020), provide strong evidence for retinotopically local learning of disparity discrimination that is independent of both sleep and the particular combination of training and testing locations.
The interpretation of our present findings is limited by the study's sample size. We cannot exclude that more subtle effects of sleep on perceptual learning evaded detection by our analysis in this and our previous study. We therefore encourage further research in this area, including replications using the same experimental paradigm. Furthermore, our experimental design cannot rule out a confounding influence of circadian rhythm. In the sleep group, retrieval and generalisation were tested in the morning while in the wake group they were assessed in the evening. It is possible that in the sleep group, time of the day of the second session masked improved perceptual skills. This will be an important issue for further investigation. While binocular disparity discrimination is well-suited for our particular research question, it is unclear how far the present results can be generalised to other tasks, such as contrast detection, motion detection, or fine form judgements (Seitz, 2017). Future studies should investigate sleep effects on a wider range of perceptual skills.
The overall picture emerging from literally thousands of studies is one of a surprising degree of flexibility in low-level sensory processing, even in the adult brain (Rolfs et al., 2018). However, the boundary conditions for this flexibility are much more restricted than for higher forms of learning. Delineating the conditions necessary and sufficient for low-level learning to be re-processed and generalised will be a crucial step towards integrative models of robust perception (Jacobsen et al., 2018).

CO N FLI C T O F I NTE R E S T
No conflicts of interest declared.

AUTH O R CO NTR I B UTI O N
JGK and KR designed the study. HN provided experimental setup and code. JGK coordinated the study, carried out the experiments, and analysed the data. JGK and KR carried out the statistical analyses and drafted the manuscript. All authors reviewed the manuscript and gave final approval for publication.

DATA AVA I L A B I L I T Y S TAT E M E N T
All performance data and statistical analyses have been uploaded to the Open Science Framework and are publicly available (https://osf. io/zqkuw/).