Sleep scoring in rodents: Criteria, automatic approaches and outstanding issues

There is nothing we spend as much time on in our lives as we do sleeping, which makes it even more surprising that we currently do not know why we need to sleep. Most of the research addressing this question is performed in rodents to allow for invasive, mechanistic approaches. However, in contrast to human sleep, we currently do not have shared and agreed upon standards on sleep states in rodents. In this article, we present an overview on sleep stages in humans and rodents and a historical perspective on the development of automatic sleep scoring systems in rodents. Further, we highlight specific issues in rodent sleep that also call into question some of the standards used in human sleep research.


| INTRODUCTION
Understanding sleep in rodents provides an important tool to investigate the underlying mechanisms of neural processing during sleep, develop clinical models to understand the pathologies underlying sleep disorders and understand how drugs impact the sleep structure and quality.For humans, criteria have been developed to enable standardization and comparability of sleep state scoring; however, so far, such standards are lacking in rodent research.In this article, we highlight what we currently know about sleep state criteria, and we provide a historical perspective on automated sleep scoring methods in rodents.Finally, we will discuss the need for standardization in rodent sleep research as well as question the current definition of sleep substates in humans and rodents.
Human sleep states were clearly defined first in the sleep scoring manual by Rechtschaffen and Kales (1968) and later refined in the sleep scoring manual of the American Association for Sleep Research (Iber et al., 2007), as we will highlight next.
In short, humans sleep is divided into two main states non-REM (NREM) and REM sleep (REM) that alternate throughout the night in NREM-REM cycles that last 60-120 min (Figure 1).NREM is then further divided into substages S1-S4 (N1-N3 in the American Academy of Sleep Medicine [AASM] classification).
S1 is notable by the disappearance of alpha oscillations dominating quiet wake and tends to mainly occur in short epochs when transitioning from wake to sleep.S2 is defined by the appearance of sleep spindles and Kcomplexes (global slow oscillations or global/large delta waves), and is the most common sleep stage in humans (50% of sleep).S1 and S2 together compose light NREM sleep.S3 and S4 are a gradual deepening of NREM sleep and are defined by the occurrence of slow wave activity (global and local delta waves) for more than 30% and 50% of the epochs, respectively.These two stages together are also known as slow wave sleep (SWS) or deep NREM sleep and in the newer AASM criteria are combined into N3.With the exception of this change, the AASM and Rechtschaffen and Kales criteria are very much overlapping and applying one or the other manual will not create a large difference in sleep scoring (Moser et al., 2009).However, for continuity and comparability as well as due to habit, most basic researchers still stick to S1-S4 nomenclature, whereas N1-N3 is now more common in clinical research.
The final and most renowned stage is REM sleep (also known as paradoxal sleep, PS).REM is defined by wake-like EEG (low amplitude, desynchronized oscillations), muscle atonia (usually measured at the chin) and rapid eye movements.REM periods without rapid eye movements are called tonic REM, whereas periods with rapid eye movements and muscle twitches are called phasic REM (Simor et al., 2020).
A night of sleep will usually also contain 10-15 microarousals, which are very brief periods of wake that usually occur at the end of a NREM-REM cycle or at other sleep stage transitions.Most humans will not gain consciousness during microarousals and therefore will also not remember these awakenings.
Although the length of NREM-REM cycles stays the same throughout the night, which sleep stages dominate the cycle changes.Early in the sleep period, SWS will dominate the cycle, whereas later, it will be REM.S2 remains stable throughout the night and is the most common stage.
As mentioned above, NREM sleep in humans is divided into 'light' and 'deep' sleep.The separation of the two states is defined by the amount of delta waves that are present in the period (>30%), which are hallmarks of deep NREM.Interestingly, although NREM-SWS is known as deep sleep, REM sleep has a higher arousal threshold than any of the NREM states; therefore in some ways, REM is actually the 'deepest' sleep stage, if deep sleep would correspond to highest arousal threshold.
Currently, the gold standard for human sleep research and clinics remains manual scoring by experts.However, due to the time-consuming nature of manual scoring, many automatic scoring systems have been developed (e.g., see Anderer et al., 2005;Danker-Hopfe et al., 2005).Because the criteria for sleep states are very well-defined, automatic scoring of human data works reasonably well as long as healthy subjects are used.
F I G U R E 1 Sleep states-Current common practice.Sleep states are displayed that are most commonly used in human (a) and rodent (b) sleep research.On the left, an example of EEG traces and on the right idealized hypnograms.Adapted from (Genzel et al., 2014) Automatic scoring tends to mainly fail with patient data.However, in this review, we will focus on automatic scoring of rat data, for approaches in humans; see recent reviews (Fiorillo et al., 2019;Phan & Mikkelsen, 2022).

| RODENT SLEEP IN COMPARISON TO HUMAN SLEEP
Although human sleep stages have clearly been defined and there is a consensus on criteria used for scoring, there is no agreed-upon manual for sleep scoring in rodents.The principal sleep structure of two alternating states (NREM and REM) is conserved in rodents; however, there are some biological differences as well as differences in nomenclature.
In most laboratories, rodent sleep will be semiautomatically scored with manual curation of the state transitions.First, a threshold per animal on electromyogram (EMG) activity or movement will be set to separate wake from sleep; next, the sleep periods will be split into NREM and REM by computing the theta/delta spectral power ratio and setting a threshold.Subsequently, an expert scorer will visually inspect the signal and rescore misclassifications.These tend to occur mostly at the transitions between sleep stages.Thus, in the end, most laboratories will use the three main sleep stages (wake, NREM and REM; see Figure 1b) and will not further subclassify these states.
Unlike humans, rats and mice are night-active and polyphasic, and their individual sleep epochs as well as NREM-REM cycles are much shorter (Figure 1b).The sleep cycles in rats are 10 min compared to 60-120 min in human (Lacroix et al., 2018).Furthermore, there are differences in nomenclature, especially in older articles in which SWS is used as a synonym for all NREM sleep instead of using it only for the deeper stages of NREM (S3 + 4/N3) as is common in human sleep (Genzel et al., 2014).However, although most studies will not distinguish between different NREM substates, recent work has demonstrated that NREM sleep in rodents exhibits a human-like structure with different types of NREM (Lacroix et al., 2018).But even those that do separate light and deep NREM will often call them SWS I and SWS II instead of light sleep and SWS (see examples below).When criteria similar to those in humans are applied to rodent data, the same basic sleep structure is uncovered with sleep being dominated by the equivalent of Stage 2 sleep (60%); however, individual sleep bouts last only a few seconds in contrast to minutes in humans (Lacroix et al., 2018).Accordingly, most studies do not scrutinize different substages as the combination of shorter bouts, increased number of transitions, and often longer recording periods (classic sleep studies are 24 h instead of 8 h in humans) would lead to a much longer duration and work load of the scoring process.
Similarly, one can distinguish phasic and tonic REM in rodents (S anchez-L opez & Escudero, 2011); however, because EMG and electrooculogram (EOG) electrodes are much more invasive and difficult to include in rodents, a differentiation is rarely done.It is possible to use recordings from the hippocampus for this separation because phasic REM displays a transient acceleration of the theta rhythm lasting 2 s, with increased hippocampal neuronal activity, increased power of high-frequency oscillations and enhanced theta-and gamma-band coherence within the hippocampal formation (de Almeida-Filho et al., 2021).Hippocampal recordings, whereas rare in humans, are more often included in rodent sleep studies, especially if the hippocampal ripple oscillations are within the scope of the study.
In sum, although sleep stages and structure seem to be conserved in rodents, shorter bouts and faster cycling of these sleep stages have led to differences in the usage of substates as well as nomenclature.

| DIFFERENT POTENTIAL LIGHT NREM SLEEP STAGES IN RODENTS
Currently, most studies rely on semiautomatic scoring with manual curation of state transitions with in-house, individual scoring scripts.However, because clear and consensus-based criteria do not exist yet for rodents, there is no agreement on which features would define light versus deep NREM sleep.Further, often many different states are lumped together into 'Quiet Wake', which could potentially have different origins and functions.Next, we will highlight the different behaviours and states that tend to be or could be classified as Quiet Wake or light NREM sleep.
Authors will refer to 'quiet wake' for the different types of behaviour that animals display in the task environment (e.g., track, open-field box) as well as behaviours displayed in the sleep environment (e.g., home cage, sleep recording box, bucket/inverted flower pot used for between task rest): 1.In the task environment animals will rest between task runs, that may just be a short (a few seconds) pause at, for example, the end of the track before completing the next run, but sometimes such breaks can be longer and include grooming.Such breaks can also occur in the middle of a task run and then are often caused by outside distractors.These different breaks can support different functions, such as memory consolidation via memory reactivations (e.g., often seen during rest periods at the end of track running after the animal ate the treat) or evaluation of potential threats (e.g., when outside stimuli as sudden noise or smell 'distract' the animal).In the task environment, these periods are usually identified by their low-theta power and should only comprise short periods of the session.2. In the sleep environment, one could argue that all behaviours are quiet rest, because the animals are not actively doing a task.However, as in the task environment, there will be theta-dominant periods and periods with less theta.During theta-dominant periods, the animals are usually exploring the environment or attending to outside distractors (Figure 2).Animals will also spend larger periods grooming and sitting or lying down relaxed before falling asleep.The special difficulty with rodents is that they can lie down relaxed with their head resting on the floor and be awake, but they can also sleep with open eyes.Therefore, it is very difficult to determine the transition of relaxed wake and early sleep.In humans, relaxed wake is dominated by alpha-waves, which disappearance defines the onset of sleep.Rodents do not have this alpha-signal to allow for clear identification of the earliest sleep stages.
There are multiple sleep phenomena that could be seen as light non-REM sleep: 1.When falling asleep, there is a period where cortical electrodes will show a low-amplitude signal, but the hippocampus would show a high density of ripples (Figure 3).Usually, this period is scored as quiet wake (QW), but it could potentially correspond to human S1.Rodents do not show a dominant alpha oscillation when resting with closed eyes; therefore, the transition from QW to S1, which is very clear in humans, is less clear in rodents.It would be important to identify if these periods correspond to S1, because it is likely to contain several memory reactivation events due to their high ripple density.2. Next in the sleep cycle, rodents will show alternating periods of delta bursts and low-amplitude EEG (Figure 3).The delta bursts will be accompanied by a high density of ripples as well as other sleep oscillations.Some will score these as alternating periods of wake and SWS/NREM, but potentially, this would be one state with alternating subperiods and would correspond to S2 in humans.These alternating periods were formerly often described as one of the cyclic alternating patterns (CAP) (Migueis et al., 2021) and seem to be regulated by the locus coeruleus (LC) active and silent periods that then also modulate arousability (Lecci et al., 2017;Osorio-Forero et al., 2021).This sleep state is especially challenging to automatically score because it is defined by oscillations macrostructure (alternation of two different spectral profiles) and cannot be determined when only inspecting 1-s epochs.3.In rats (less so in mice), one can identify a unique sleep stage characterized by many and long spindles (Gottesmann, 1973;Vyazovskiy et al., 2004).Because this spindle-dominated stage occurs after NREM and before the beginning of REM sleep, it is called transitional sleep (TS, Figure 3) or intermediate sleep.TS is visually distinguishable from REM sleep due to the irregular shape of the continuous oscillation that also has a higher frequency than REM theta.It is challenging for an automatic algorithm to distinguish TS from REM due to the brief duration of TS (1.8% of the total sleep as reported by Grieger et al. (2021)).However, the current consensus would be that TS would be part of NREM due to the spindle dominance.
The detection of TS might alter our understanding of sleep in the rodents.As a result of more TS detections, the total number of NREM bouts will increase with a corresponding shortage of their duration (Mandile et al., 1996).It is currently unclear if humans have TS.Although it is not as visually clear as in rats, a study (with n = 11,000) could show an increase in fast spindle activity preceding a transition from N2 to REM sleep and a decrease in slow spindles potentially reflecting the same phenomena (Purcell et al., 2017).However, in a study directly comparing the change in EEG power spectrum in rodents against humans, the activity within the sigma band (11-16 Hz) was more enhanced in rodents than in humans (Bjorvatn et al., 1998).
It would be tempting to speculate that TS is a unique state that is a result of the system switching from NREM to REM with the corresponding changes in neurotransmitters (Navarro-Lobato & Genzel, 2018;Samanta et al., 2020).In NREM, serotonin and acetylcholine are low relative to the REM state, in which serotonin remains low, and acetylcholine exhibits high (er) levels.This will bring the hippocampus to switch from 'sender mode' (high output and low input) to 'receiver mode' (high input and low output) (Schall et al., 2008;Schall & Dickson, 2010).We also know that the silent LC neurons are required for the presence of both spindles as well as REM sleep (Swift et al., 2018a).Thus, LC neurons shifting from sparse activity to silence before REM state, whereas acetylcholine levels are still low, could allow for the continuous spindle state in TS sleep.The switch from TS to REM (and therefore spindle to theta) would then occur due to the subsequent increase of acetylcholine (which is involved in theta rhythm regulation in the medial septum).Although this would be a potential explanation for TS sleep, more evidence is needed in support of this idea.
In sum, there seem to be multiple sleep phenomena in rodents-QW, alternating delta/desynchronization F I G U R E 3 Potential examples of NREM light sleep states: (a).after initially lying down rats will show a state that is currently commonly scored as quiet wake, where the cortex (blue) is still desynchronized, whereas the hippocampus (black) shows a burst of ripples.Potentially this could correspond to human S1 NREM.The hippocampus keeps the rippling state; slowly, the cortex will show more and more delta waves (shown on the right).(b) Next, usually rodents will show alternating periods that show either a lot of delta and ripples (black bars) or are low amplitude in the cortex (delta free, white bars).Above more zoomed in, below wider view.(c) After the main NREM bout, rats will show a spindledominated state that is known as transitional state (TS).Usually, TS is followed by either wake or REM.TS shows waxing and waning spindles (11-16 Hz) in the hippocampus, and the cortex can display slow waves, spindles or be low-amplitude signal.For contrast below examples of REM sleep, which differ from TS by the regular theta (5-9 Hz) in the hippocampus and low-amplitude signal in the cortex.Shown is unpublished data.
and TS-that are likely to correspond to the light NREM stages known in humans (S1 and S2), highlighting the heterogeneity of this sleep state.

| HISTORICAL PERSPECTIVE OF AUTOMATIC SLEEP SCORING IN RODENTS
In this section, we provide a historical overview of the literature on automated sleep scoring in rats.Our aim is to highlight the different methods used in automated sleep classification, their progression over the years and the outcomes of different studies.We will present the work in chronological order and keep the sleep stage names that were scored as used in the original manuscripts; therefore, SWS is often used to describe what is likely a general NREM sleep state.In each section, we provide a concise summary of the main points related to each of the presented methods.Detailed information about data processing for sleep classification is provided in Table 1.

| The 60 and 70 s
One of the earliest attempts to describe sleep stages in rats dates from 1961 by Michel et al., as reported by Ruigt et al. (1989).However, manual sleep scoring was considered an exhausting and time-consuming process, especially for long recording times, and could be prone to inter-rater variability, specifically if the raters were inexperienced.Therefore, several attempts were made already in the late 60 and 70 s to develop an automated sleep staging method for rodents to uncover their sleep architecture and accomplish scoring consistency between labs.
One of the earliest attempts to develop a fully automated method was conducted by Kohn et al. (1974).They used a hardware hybrid system adapted originally from human experiments (Table 1).This system was initially used to classify human sleep stages into the five conventional phases (S1-S4, REM).In this work, Kohn et al. (1974) claimed that sleep in laboratory animals is not a complex phenomenon compared to humans disregarding earlier experimental findings reported by Gottesmann et al. (1971).Therefore, the system classified sleep only into three main sleep stages: wake (W), non-rapid eye movement (NREM) and rapid eye movement (REM).
Because REM and NREM sleep stages would potentially have different functions, selective sleep deprivation requiring online sleep scoring is an important research tool.Winson (1976) devised a system, which automatically identified REM sleep to elicit awakening in animals before entering the REM sleep stage.Therefore, the main focus of this system was identifying theta oscillations from the hippocampus.Subsequently, SWS was determined by exclusion as periods with absent theta oscillation and no motion.In this study, SWS was defined as QW, which is not an accurate depiction of this sleep state, because NREM/SWS encompassed several stages, as demonstrated by the following work of Gottesmann et al. (1977).
To minimize the artefacts introduced by using wires and body movement during recordings, Gottesmann et al. (1977) and Neuhaus and Borbely (1978) devised a wireless (radiotelemetry) recording system.The recording system built by Gottesmann et al. (1977) was formed of four channels and could detect and eliminate additional recording artefacts automatically.The pass-band frequency of the acquired signal was between 0.22 and 400 Hz.They implanted the rats with electrodes in frontal and occipital brain areas.Furthermore, they used two additional electrodes to obtain the EMG signal from the neck muscle and the EOG signal from the eye socket.To automatically classify sleep, the authors computed the wideband (5-20 Hz), delta (1.5-4 Hz), spindle (10-13 Hz), theta (5-8.5 Hz) and global EMG powers.The periods of high theta power were detected based on delta/theta and spindle/theta ratios.The computation of these parameters provided a framework to classify the different sleep stages of the rats based on assigned probabilities for each sleep stage.Interestingly, this study identified sleep stages (Table 1) including two REM sleep stages: with and without eye movement periods, which would potentially correspond to phasic and tonic REM, respectively.Because it was challenging to manually identify the sleep microstates that the automated system recognized, the authors combined several sleep substages to enable the comparison and validation of the automatic scoring system based on manually scored data.Furthermore, it emphasized the difficulty of manually sorting the fine-grained sleep phases of rodents.
Unlike previous work, Neuhaus and Borbely (1978) only identified three vigilance states: waking, SWS and paradoxical sleep (PS, synonym to REM sleep).This system relied on the ratio between EEG power and motor activity to differentiate between wake and SWS.PS was identified by the low-pass filtering of the EEG signal and counting the zero-crossing waves and then comparing it to the reference signal.
Finally, Johns et al. (1977) introduced another automatic analysis system based on four multichannel analyzers, and each channel was equipped mainly with a buffer, integrate and hold facility, amplifier and bandpass filter.One of the four channels was assigned to the EMG signal, which was band-pass filtered at 10-150 Hz.The other three channels were used to filter the electrocorticogram (ECoG) signal into three different frequency bands: the wide-frequency band (0.5-48 Hz), delta band (0.5-4.5 Hz) and theta band (5.5-8.5 Hz).Next, the output was passed to an analogue-digital converter and saved on a cassette recorder.The previous criteria enabled the system to identify four sleep stages: wake, SWS I, SWS II and REM sleep.In this automated system, several criteria assisted the disambiguation between SWS I and SWS II phases.First, the SWS I preceded the SWS II.Second, the EEG during SWS I exhibited mainly irregular slow wave activity followed by spindles and the EMG showed medium-low voltage amplitudes.As animals transitioned from SWS I to SWS II, the EEG signal started to display a more regular highvoltage delta oscillation with low-amplitude EMG.
The 60 and the 70 s relied mainly on using hardware and logic circuits to conduct sleep recordings and generate automated sleep scoring systems.Notably, creating these systems required engineering knowledge to design and implement the hardware logic within different circuits, representing several challenges towards the general use and implementation of such systems in labs, which do not possess this knowledge.The most comprehensive sleep scoring method during this era was conducted by Gottesmann et al. (1977).Several factors contributed to this success.First, they used a wireless system, which minimized the number of artefacts compared to other recording systems developed during the same time.Second, and by contrast to other studies, they used a smaller time epoch, which was essential to uncover the identified sleep microstates.Third, they implanted electrodes within the ocular cavity to measure EOG activity during sleep-only study during this period.Thus, EOG recordings enabled them to distinguish between both phasic and tonic REM sleep states.Fourth, they considered spindle power, which allowed identifications of TS states.Finally, they recorded the EEG activity at the frontal and hippocampal brain areas, unlike, for example, Winson (1976) who recorded only the hippocampal oscillations.This recording configuration was optimal to accurately identify theta and delta activity allowing the accurate determination of SWS and REM states.Another distinct remark of this era was the identification of two SWS sleep states as shown by Johns et al. (1977) and others.Both delta wave shape and EMG power were critical to disambiguate these two sleep substates, which could potentially correspond to light and deep sleep in humans.By combining the results of the different studies, we conclude that sleep architecture in rodents is complex and rich in several sleep fine-grained states.Although the studies during this era used primarily hardware logic to create automated sleep scoring systems, several other factors critically contributed to the outcome, for example, epoch length, feature extraction and the placement sites of different electrodes.In summary, the 60 and the 70 s exemplified the start of rodent sleep research.Several attempts were made to reduce manual scoring efforts and advance our understanding of rodents sleep structure.The researchers relied primarily on hardware logic to develop the abovementioned automated systems, as both the computing power and software development were limited during this period.However, and despite these limitations, the automated systems accomplished high classification accuracy (92%-98%) when compared to human scorer.Considering the outcome of this era, we examined the automated sleep scoring methods in the 80 s to inspect and evaluate the progress of our understanding of sleep architecture in rodents.

| The 80 s
In the early 80 s, researchers started to use software instead of hardware logic to classify different sleep stages automatically, as we will discuss next.
One of the earliest works to benefit from this advancement was conducted by Chouvet et al. (1980).In their system, they employed solid state logic comparators to simulate linear discriminant analysis and extend it to classify sleep into three main sleep stages.The linear discriminant analysis was considered a dimensionality reduction technique that classified the data points in the feature space into specific number of clusters based on their particular properties.In this study, they relied mainly on four main features to identify the different sleep stages: the theta/delta ratio, the variability of the EEG amplitudes, the zero crossing of the EEG and the integrated EMG signal.These features were computed for each epoch (30 s) and were projected to a higher dimensional space, and linear discriminant analysis was applied to identify the three main sleep stages (W, SWS and PS).Then, Mendelson et al. (1980) advanced the sleep classification methods further by using continuous frequency analysis to accurately estimate the power of the different frequency bands rather than relying solely on signal amplitude.Nevertheless, the system yielded mainly three main sleep stages (W, SWS and PS).It is worth mentioning that the previous systems used very long-time epochs for sleep classification (30 s) and less features compared to previous studies (see previous section, Gottesmann et al. (1977)).Furthermore, the linear discriminant analysis was mainly used to solve supervised classification problems.Therefore, the outcome of this method depended primarily on the scorer accurate identification of the different sleep stages.Thus, it did not allow the scoring system to reveal the fine-grained sleep states.
As previously discussed, using hardware-based classification systems required an engineering knowledge to implement the logic within the circuits.An important development of the automated sleep scoring was associated with the beginning of the microcomputing era.Vivaldi et al. (1984) introduced a classification system based on microcomputing system to identify different sleep stages and quantify the waveforms associated with them, namely, spindles, delta waves and theta/ rhythmical slow activity.The algorithm used the count of delta and sigma waves to identify SWS.On the other hand, rhythmical slow activity (theta) in the hippocampus was employed to recognize the REM sleep periods.Unlike the previous two studies, the hallmark of this system was the isolation of waveforms in different frequency bands and using them in sleep classification instead of depending on the amplitude/power in sleep staging.Nevertheless, it was not an adequate method to identify, for example, light and deep sleep because the difference between these states emerges from the spectral profile of delta oscillations rather than the count.As a result, this system managed mainly to identify the three main sleep stages (W, SWS and PS).
A more advanced method was developed by van Luijtelaar and Coenen (1984) who used a microcomputing system to scrutinize the microsleep stages in rodents.Their algorithm was based on averaging the running EEG and constructing average spectrograms to identify the variability in the EEG and the theta/delta ratio.Together with the amplitude index of the EMG, the authors were able to classify sleep into four stages: wake, light SWS, deep SWS and REM.Interestingly, rodent sleep showed a periodic pattern similar to human sleep where deep SWS was abundant during the early hours of sleep, whereas REM sleep tended to increase duration towards the end of sleep.The deep SWS was identified as high-amplitude delta periods, whereas the mediumamplitude delta periods that were associated with highamplitude variability was classified as light SWS.This study provided evidence of the conserved sleep pattern between rodents and humans.Although this study was insightful for its time, it depended solely on the hippocampal EEG to identify different sleep patterns with no placement of cortical recording electrodes to precisely inspect the delta spectral profile.
To scrutinize the substages of NREM sleep in rodents, Bergmann et al. (1987) devised a clustering methodparametric animal state scoring system (PASS), which used the minimum distance between the measured power in specific EEG frequencies and standard values assigned to each sleep state.Instead of identifying NREM sleep as SWS I and SWS II as the previous studies, they classified the NREM sleep into high-amplitude and lowamplitude states.The low-amplitude state was characterized by low EEG amplitude, low-theta amplitude and low EMG values.The high-amplitude state could presumably correspond to deep sleep, whereas the low-amplitude state would have then depicted light sleep.In their methods, the authors did not use specific clustering algorithms, rather they relied on the distribution of different values e.g., EEG amplitudes to distinguish between low and high-amplitude NREM sleep.Therefore, the identification of other sleep microstates would be challenging considering only theta and delta amplitudes.Yet, this study added to previous reports and further emphasized the existence of light and deep sleeps states in rodents.
Because the computing power was limited in the 80 s, developing a fully automated system was an expensive process.Therefore, Clark and Radulovacki (1988) introduced an inexpensive automated classification system based on the IBM computer that was popular during this time.The system relied mainly on the EEG and EMG amplitudes to define the three main sleep stages: W, SWS, and PS.Because this method did not depend mainly on filtering the signal in its different relevant frequency bands, the algorithm confused sleep and wake states several times, which required the interference of a human investigator to correct for such deviations.Because this system relied on the EEG amplitude, disregarding the contribution of specific frequency subbands to the overall spectral results, using this method would not be favorable to accurately depict the fine-grained sleep architecture in rodents.
An advanced leap into a more accurate sleep classification was taken by Gandolfo et al. (1988) who introduced a more detailed algorithm based on the Apple II computing system.In their work, they discussed the problem of the brief duration of sleep microstates in rodents such as intermediate sleep.Therefore, they used a much smaller epoch duration compared to the previous studies (1 sec).Importantly, they used the following parameters to identify more fine-grained sleep patterns of rodents, namely frontal cortex wide-frequency band (5-20 Hz), theta rhythm, EOG and EMG.Subsequently, they identified more sleep stages in rodents; active wake, wake without theta oscillation, slow wave sleep, anterior spindles and deep slow wave, intermediate sleep, paradoxical sleep without eye-movement and paradoxical sleep with eye-movement.Furthermore, the study emphasized the individuality of different sleep patterns between rats which should be taken into account before starting sleep data processing.Because this study identified more and precise sleep patterns in rats, it provided further evidence that sleep structure in rats resembles human-like states.
Another important advancement was the use of hardware based Fast Fourier Transforms (FFT) as proposed by Goeller and Sinton (1989).Their system was based on performing FFT to obtain the relative power of more distinct frequency bands (Table 1).This refined classification of frequency enabled the authors to classify SWS into two separate stages, SWS I and SWS II.The key factor in distinguishing the two microstates was the threshold assigned to the total power in the EEG.If the EEG total power was above the threshold, the SWS was defined as SWS II and if it was below the threshold it was identified as SWS I. SWS I is the drowsy state before the appearance of more deep delta oscillations and potentially is the same as light sleep (Stages 1 and 2) in humans.
A more comprehensive approach was conducted by Ruigt et al. (1989) where they filtered the EEG signal into more fine bands (1)(2)(3)(4)(5)(6)(6)(7)(8)(9) similar to Goeller and Sinton (1989).However, they used a smaller epoch durations combined with linear discriminant analysis to differentiate more sleep stages compared to Goeller and Sinton (1989) and Chouvet et al. (1980).They identified 6 sleep microstates; QW, active wake, quiet sleep, deep sleep, pre-REM sleep and REM.It is important to mention that the pre-REM sleep periods here were equivalent to TS reported by early literature.The classification was based mainly on the movement detector which enabled the differentiation between active and QW. Background EEG activity together with the EMG signal helped to map the EEG activity to its corresponding sleep states.Here, the spindles were detected primarily by the increase in power in the corresponding sigma band.This method represented one of the largest scale studies in which 32 rats were recorded simultaneously and the sleep scoring was automated.
Considering this era in sleep research, we observed that the introduction of small computing systems and the improvement in hardware recording systems enabled the increase of rodent sample size in experiments (see Ruigt et al. (1989)), providing more accurate results about their sleep classification.Furthermore, it allowed the use of more advanced analysis methods e.g., FFT and linear discriminant analysis in sleep classification.Two main studies provided a more accurate depiction of sleep structure in rodents; Ruigt et al. (1989) and Gandolfo et al. (1988).Several elements contributed to attaining these results e.g., using shorter epochs (1 s), examining more frequency bands, obtaining hippocampal and cortical EEG and EOG signals.This indicates that feature extraction and signal processing are critical for sleep classification regardless of the automated algorithm.
In sum, the automated methods developed in the 80 s depended primarily on deploying software logic in sleep classification.This corresponded to more advancement in computing technology and the ability to use more advanced analysis techniques e.g., clustering, linear discriminant analysis and FFT.Similar to the previous decade, the proposed automated systems offered a high global agreement relative to manually scored data (≈ 80-96%).

| The 90 s
In the 90 s, computing technology of both hardware and software continued to develop further which barreled the classification process and allowed for using more advanced analysis techniques.
Another automated sleep classification approach was implemented by Kleinlogel (1990).In his approach, he used FFT to compute the power spectrum of the EEG signal obtained from the frontal and occipital cortex.By using 8-s epochs, he identified eight sleep stages in rodents: active-wake, theta-wake, wake, classical sleep I, classical sleep II, TS, transitional dozing and paradoxical sleep.The power spectrum result of classical sleep I were equivalent mostly to the average power spectrum with characteristic low beta on the frontal cortex and low theta on the occipital cortex.On the other hand, classical sleep II was characterized by power spectral pattern similar to the dozing with complete absence of theta activity from the occipital cortex.Consistent with previous reports, Kleinlogel (1990) observed an increase in spindle activity prior to REM or if REM was expected.This method used the power spectrum and its deviation from the average values as the primary method to label the different sleep states.It is crucial to point out that power spectra comprise periodic and aperiodic fractions.Changes in the aperiodic part of the spectrum and the subsequent change in its slope might give rise to apparent changes in the oscillatory power in different frequencies (Gao, 2016).Therefore, it is critical to control for the power spectral slopes before using the average spectra as a method for identifying different sleep states.
In the same year, Itowi et al. (1990) proposed another method for sleep classification, which combined visual classification criteria together with an automatic system that used EEG and EMG amplitude to classify sleep into the main three sleep stages.In their method, they used template matching by establishing a pre-template, which contained the optimum level amplitudes of EEG and EMG signals depending on the criteria used to visually classify sleep.Although, some of the previous studies did not use EOG to classify sleep, this study explicitly argued that the EMG signal is adequate to classify sleep reliably.However, as we demonstrated before, employing EOG in sleep classification would enhance the detection of phasic and tonic REM sleep substates.Furthermore, because the study relied on criteria similar to the criteria used for manual scoring, identifying the fine-grained sleep stages was not attainable.
As computational methods improved, there was a shift towards the usage of more statistical and computational parameters.Karasinski et al. (1994) devised a method, which utilized the different statistical moments of the signal (standard deviation, skewedness and kurtosis of the signal) and another two harmonic variables of the signal (zero crossing of the signal and the relative minima and maxima).Subsequently, each 8-s epoch was represented by these five numerical values.The amplitude histogram values exhibited a specific distribution, which aided the identification of the different sleep stages.For example, the wake state exhibited a Gaussian histogram shape, whereas REM sleep histogram was more flattened and was identified better by the kurtosis of the signal.This study, however, did not use EMG to identify sleep and wake patterns instead it used only the signal acquired from the ECoG.Nevertheless, they validated their results against other data in which ECoG and EMG was acquired and yielded identical results.It was observed that REM sleep validation results (83%)compared to expert classification-were the lowest among the other sleep states (96% for wake and 97% for NREM states).This outcome could be explained partially by the absence of EMG recording in this study, which is essential to disambiguate wake and REM states.It is challenging to use this method to inspect the substates of sleep, because identifying such states depends on the temporal changes in the structure and the power profile of the ongoing oscillation.
In another study, Neckelmann et al. (1994) examined the validity and quality of the criteria used during manual sleep scoring and tried to develop a computer algorithm to semiautomatically identify different sleep stages.The algorithm was based primarily on the conventional criteria used for sleep classification: theta/delta ratio, sigma power, delta power and the variance in the EMG signal.The study demonstrated a great consistency between human scorers who depended on visual scoring.However, this consistency was declined when the computer-based scoring system was compared against human scorer.Although, the authors identified quiet and active waking in their classification, they grouped both states as wake.Interestingly, the SWS-I was characterized by the predominant spindle activity associated with delta oscillation in less than 50% of the scored epoch.In addition to spindles, SWS-II was identified by the eminent delta oscillation in 50% or more the scored epoch.Similar to Gottesmann et al. (1971) and Gottesmann et al. (1977) transition sleep exhibited spindle activity in the frontofrontal cortex and dominant theta oscillation on the frontoparietal region.They standardized the scoring system between experimenters and then tested that on the computer system to achieve a higher degree of reliability.In this study, the researchers used a combination of long (10 s) and short (2 s) time epochs to extract the features of the EEG signal.This approach would be favorable in identifying the main sleep states and their associated substates.One of the main worries of the researchers in this study was the lack of tools to measure the temporal evolution of the delta oscillations spectral profile, because it was difficult to implement more complex analytical tools that could capture these changes.Another critical point raised by the authors was the presence or the absence of spindles in rodents.The researchers argued that the increase in sigma power, main criteria used to identify the presence of spindles, might indeed not reflect the presence of spindles in specific epochs.This was a very essential point raised by the study.Because the lower boundaries of the sigma band and the upper boundaries of theta oscillations might overlap, it is necessary to distinguish both types of oscillations through a careful scrutiny of the frequency content of each oscillation type and their spectral profile relative to each other, for example, see approach used by Bandarabadi et al. (2020).
The previous studies showed a clear transitional phase of sleep preceding REM sleep, which was identified as transition/pre-REM sleep periods.Targeting this specific sleep stage, Benington et al. (1994) developed an automated approach to identify transition sleep states while inspecting other sleep states.The algorithm used delta, theta and sigma power together with the EMG signal to classify sleep into four states: wake, SWS, transition to REM and REM sleep.Identifying this unique sleep state is critical in determining the start of REM sleep.Several interesting observations made in this study.First, NREM sleep was interrupted by brief periods of waking or REM sleep.These brief interruptions were interpreted by the authors as wake periods.However, these interleaving brief wake periods could represent microarousal states, a form of wake that was shown to exhibit specific electrophysiological characteristics (Dos Santos Lima et al., 2019).Second, the authors observed periods of decreased delta, theta and sigma power that were also classified as wake.Nevertheless, these periods could represent QW or a drowsiness state, which might be different from microarousals and active wake (see Section 4: Different Potential Light NREM Sleep Stages in Rodents).
To improve the classification further, Witting et al. (1996) devised an online classification system based on continuous evaluation of the power spectrum of the EEG signal.Instead of filtering the signal into different frequencies, the method relied on computing the Euclidean distance between 10 averaged power spectra (1-32 Hz) within one epoch to a standard power spectrum for each state.The states identified by this method were QW, active wake, REM and quiet sleep.Notably, the authors addressed an important problem about EMG signal fluctuations that made the distinction between wakefulness and REM sleep challenging.The online classification of sleep could be useful in sleep deprivation studies in order to examine the contribution of different sleep states to memory consolidation.However, the researchers in this study used Euclidean distance as a metric to measure the relationship between the computed and standard power spectra.The Euclidean distance could be very robust in measuring the distances in 2-D spaces but not in higher dimensional spaces.Additionally, averaging the power spectra (10 spectra/10 s) could obscure relevant brief spectral changes associated with, for example, transition sleep state.
An important change in the classification procedure was conducted by Robert et al. (1996) using artificial neural networks (NN).Similar to Karasinski et al. (1994), they used three statistical variable and two temporal variables to classify sleep stages into the main three sleep phases.Several phases were implemented to conduct the classification.First, the learning phase, which intended to measure the strength of connectivity between different cells within the NN.Second, the working phase, which associated each epoch to its possible corresponding vigilance state.Finally, there was a consensus phase, which used epochs classified by humans as an input to the network to measure the consistency of the classified states (agreement between the network and manual scoring).In this study, two NN architectures were used; a point and chronological network.The point network was used to process the input data epoch by epoch, whereas the chronological network considered the preceding and following epochs of each epoch.Because the chronological network examined the temporal evolution of the extracted features, it gave a slightly better result relative to the point network.This observation demonstrated the importance of considering temporal organization of different sleep states by employing, for example, Markov chain probabilities within the NN to compute the transition probabilities between different sleep states.The system, however, had to be adapted on a rat-by-rat basis making it difficult to generalize the approach.
Considering the sleep scoring attempts in the 90 s, we made the following observations.Consistent with previous studies, NREM sleep was shown to exhibit two main substates-SWS I and SWS II- (Kleinlogel, 1990;Neckelmann et al., 1994), and the REM sleep state was preceded by periods of transition sleep (Benington et al., 1994;Kleinlogel, 1990;Neckelmann et al., 1994).On the other hand, REM sleep was considered in all of the studies as a homogenous single state, disregarding its phasic and tonic substates.We observed also that combining long-and short-time epochs during sleep classification could enhance the algorithms' sensitivity towards identifying the fine-grained structure of sleep architecture (Neckelmann et al., 1994).Furthermore, employing artificial NN in sleep classification could expand our understanding of sleep structure.Nevertheless, it would be essential to consider the temporal evolution of sleep states similar to Robert et al. (1996), who showed that a chronological network displayed better results relative to point networks.Other elements could be implemented to improve the NN performance, for example, small epoch length and implementing Markov chain prosperities within the network.
In summary, the progress in automated sleep scoring was associated with the advancement in computing technology and the affordability of using advanced statistical methods.Interestingly, the developed systems during this decade uncovered the potential use of artificial NN in sleep scoring.However, no significant improvements were achieved with regard to the accuracy of automated scoring systems (≈80%-98%) compared to the earlier work.

| The 2000 s and beyond
In the 2000s, researchers tended to use even more advanced analysis methods, which included machine learning and artificial NN applications.It is important in this context to distinguish supervised and unsupervised machine learning methods.For the former, the algorithm will be trained to distinguish sleep stages based on a training dataset previously labelled by a human scorer.On the other hand, unsupervised algorithms attempt to infer the relationship between the unlabeled states driven mainly by the features of the data.The human intervention in the supervised machine learning occurs upfront to assist the process of learning, whereas it is necessary to observe the outcome of the unsupervised learning to refine the output of the algorithm.
In one study, Costa-Miserachs et al. (2003) used a spreadsheet approach to automatically classify different sleep stages based on EEG signal obtained from the hippocampus together with EMG signal from the neck muscle.The system had indices for accuracy that allowed human control over the output of the algorithm.The multistep approach started with assigning different sleep stages to specific epochs.For example, epochs with high theta/delta ration and low EMG were classified as REM epoch.Then, the statistics for each identified state were recomputed into the classification according to specific rules.Construction of 20-s epoch hypnogram enabled the classification of the undetermined epochs to their previous states demonstrating the effect of combining longand short-time epochs during sleep classification.Interestingly, the system contained different indices to measure the accuracy of the classification and assisted the operator to accept or reject specific results.The indices measured the doubtful epoch, such as NREM epochs, which were classified as wake and did not meet the delta or EMG criteria.Such post-classification checks would be essential to implement in future algorithms by, for example, implementing hidden Markov models to compute the probability of assigning a label to sleep state considering the previous and following states.
Another system that combined visual and computerbased classification was designed by Mileva-Seitz et al. (2005).Because storing the raw data traces and using them in classification could be computationally expensive, the authors in this study replaced the usual polysomnogram by a pseudopolygram.The pseudopolygram was created by summing three different sine waves with three distinct frequencies; 3, 7 and 60 Hz, respectively.For example, a specific epoch exhibiting high power at 7 Hz would be classified as REM sleep.Although this approach could potentially accelerate the computation process, the current advances in computing and in data storage would allow the researcher to store huge amounts of data.This approach should be considered if the lab is within a country that cannot afford the current technology.Furthermore, this approach ignored the temporal variability in the frequency range between animals and between different states.
An automated scoring approach was based on using naïve Bayes classifier system (Rytkönen et al., 2011).In Bayes classification system, the authors used Bayes' theorem to assign the probability of each sleep states depending on the EEG and EMG values.The authors created manually labelled epochs for learning.Then, a vector of EEG and EMG features was used to classify the sleep patterns into the distinct three sleep stages (Table 1).Manual scoring of the data made this algorithm adaptable to individual differences in the recording.
The physiological demands of the body fluctuate in response to wake-sleep cycles; therefore, these fluctuations and body movement could potentially be used for sleep classification.These fluctuations arise from the fact that several neurotransmitters (e.g., norepinephrine), and their associated brain areas (e.g., locus coeruleus) exhibit different levels of activity during the wake-sleep cycle and therefore will also induce changes in cardiovascular and respiratory frequencies (Swift et al., 2018b).Indeed, Zeng et al. (2012) used respiratory and movement information to classify sleep stages automatically using a support vector machine approach.They used a Doppler radar method that detected the respiratory patterns and movement.Interestingly, the respiratory movement was the only detected movement during sleep and exhibited a particular pattern corresponding to NREM and REM sleep.Specifically, REM sleep was characterized by irregular respiratory pattern, whereas NREM sleep exhibited slow, steady rate.Although the system represented a noninvasive method for sleep classification, it was challenging to classify the fine-grained sleeping patterns of rodents based only on respiratory movements.In this study, several features were extracted from the signal (Table 1), and then, the features were projected into a higher dimensional space, which was reduced using principal component analysis.Subsequently, a support vector machine was applied to classify the clusters into the distinct sleep patterns.By recording high-resolution videos during sleep, Geuther et al. (2022) found that changes both in body area and shape reflected different sleep patterns.Additionally, Vanneau et al. (2021) managed to successfully identify the different sleep stages by using noninvasive piezoelectric sensors in the home cage.Although the above-mentioned methods were noninvasive, they were not adequate to characterize the finegrained sleep stages.
A fully automated approach was introduced by Sunagawa et al. (2013).In this approach, a fully automated sleep tagging method via EEG/MEG recordings (FASTER) algorithm used three main elements: signal feature extraction, clustering and then annotation.The FFT was computed from the EEG/EMG signal, and then, the computational dimensions were reduced using principal component analysis.The optimal components used in the study were four, because it was the number of the components, which achieved the highest detection sensitivity of REM sleep.The clusters were annotated by using the median logarithm of EMG and EEG delta power.Interestingly, the annotation step identified REM sleep epochs by exclusion.In other words, if a specific cluster exhibited a delta power value below a specific percentile value of delta, this cluster would have been annotated as a REM sleep cluster, demonstrating the difficulty of accurately classifying REM sleep epochs compared to wake.
In another automated approach, Bastianini et al. (2014) relied on the theta/delta ratio together with the root mean square of the EMG signal to classify sleep into three distinct states.The algorithm differentiated between wake and sleep through the EMG values.The lower EMG values, which corresponded to sleep epochs, were classified further into REM or NREM sleep based on the theta/delta ratio.The used features made it difficult to identify any fine-grained sleep structures, and the reliance on only the theta/delta ratio might bias the classification algorithm depending when not taking into account other features, for example, sigma power (spindles) to identify TS periods.
An important progress towards unifying automated sleep scoring across labs and across animals was organized by Miladinovi c et al. (2019); sleep phase identification with NN for domain-invariant learning (SPINDLE).They devised a scoring algorithm using convolutional NN-a hybrid deep NN and hidden Markov model approach-and they collected data from different labs, different animals and recording set-ups.Importantly, they used a method, which took into account the variability of a specific state within subject, between subjects and between different animal species.To achieve this goal, it was critical to examine the variability in the EEG profile and a preprocessing step to decrease the variability between subjects was used.A convoluted neuronal network was robust to detecting small deviations in the spectral patterns.Although the method was unique and used a quite advanced system in classification, which might represent the future of sleep classification in rodents, they only managed to classify the three main sleep stages in rodents (W, N and R).On the other hand, Wei et al. (2019) managed to detect five sleep stages in rats by using a decision tree automated sleep scoring method that was based on different roles considering several features of the signal.The former study was the only recent study during that was able to identify two NREM states and the TS state.The better outcome of this study was a result of computing more features compared to other studies (see Table 1) and the use of short classification epochs.Furthermore, the authors used a two-stage classification system in which 10-s epochs were classified first into one of the main three sleep stages and then a second classification was conducted using 2-s windows to assign the epochs further into one of five states (W, NREM1, NREM2, TS and REM).
Although most of the previous studies used the lowfrequency components as markers for the different sleep stages, Silva-Pérez et al. (2021) introduced a new marker based only on the high-frequency oscillations within the signal (high-frequency index).The index was computed by measuring the dimensionless ratio between 110-200 and 110-300 Hz.This index yielded three different value ranges that differentiated between wake, NREM and REM sleep.The variability in the high-frequency components of the signal during sleep could be attributed to the associated variability in the high gamma (Sirota et al., 2008;Tort et al., 2008) and dense ripple activity during sleep (Samanta et al., 2020).However, it is challenging to rely only on high-frequency oscillation to observe the fine-grained sleep states.
Taken together, in this section, we covered the most relevant literature connected to automated sleep scoring in the 2000 s (other uncovered literature in the article but included in Table 1: Kohtoh et al. (2008), Stephenson et al. (2009), Gross et al. (2009), Crisler et al. (2008)).The results highlight that in the recent years, researchers exploited the substantial advancement in modern statistical and machine learning methods.This was greatly enabled by the affordability of modern computing technology.Accordingly, several machine learning methods were employed in designing advanced and fast automated sleep classification systems (for comparison of different algorithms, see also Smith et al. (2021)).However, interestingly, this progress in using state-of-the-art computing methods did not seem to lead to any new insight (at least until now).Most scorers still rely on only the same three states (wake, non-REM and REM) and have not contributed to a standardization of recognizing these states or any substates.Studies using data-driven approaches have identified many microstates, but these are usually self-contained in the article, and there is no attempt to identify their corresponding states in human sleep.Therefore, when it comes to automatic sleep scoring and advances in technology have clearly been applied from the 60 s to current times, scoring accuracy and sleep state description have actually not progressed.The lack of progress could be attributed to several reasons.Some of the recent progress came mainly from the engineering and computational side without the contribution of sleep researchers to the process.Another problem is the lack of standardized definition of different sleep states in rodents that could be used in refining and improving different classification algorithm.Finally, sleep is an interdisciplinary field that requires communication between different researchers from different fields to regulate and standardize the sleep classification process.

| MANUAL VERSUS AUTOMATIC SCORING
As summarized above, there have been many attempts to develop an automatic scoring system of rodent sleep data.Not all automatic procedures have performed similarly well.The disagreement between manual and algorithmbased scoring might arise due to several reasons, which we will highlight next.
First, rodents sleep cycles are much shorter compared to humans, around 10 min in contrast to 60-120 min in humans (Lacroix et al., 2018).Accordingly, rodents might spend much shorter time periods in each sleep state relative to humans.It is challenging to identify these brief periods during manual classification; therefore, the comparison to the automatic scoring methods will be worse due to missing these brief periods during manual scoring.Second, an automated scoring algorithm takes into account the most prevalent state within an epoch during classification and might be blind to the heterogeneity of a specific epoch that may consist of mixed states.Human scorers can identify such heterogeneity much easier relative to the scoring algorithms.Therefore, the duration of the epochs used during classification needs to be adapted to the purpose of the classification.The best approach is to use mixed long-and short-time epochs during sleep classification.Third, sleep is a complex phenomenon, in humans and in rodents, that possess events of brief duration or events masked by other ongoing electrophysiological phenomena.Therefore, taking only human scores as a reference for the quality of the classification of a specific algorithm disregards the complexity of sleep.Even in human sleep research, lumping data across 30 s together to determine the current state, might lead to skipping the classification of much shorter microstates.Fourth, a careful detection of artefacts is critical for an accurate, automatic sleep classification.The artefacts could arise from the system or sudden movements of the animal.Due to the polyphasic sleep of rodents with more awakenings during the sleep period, they tend to have more artefacts than seen in humans.Fifth, sleep is a continuous phenomenon not a discrete one; therefore, discretizing sleep into separate epochs might introduce by default misclassification.
Manual scoring in rodents tends to be focused on only separating three main states (wake, NREM and REM).Automatic scoring algorithms and other data-driven approaches are often directed to recapitulating these states or identifying more microstates without relating these microstates to the well-known states in humans (wake, S1-S4 NREM and phasic/tonic REM).However, with the current advancement in computing power and machine learning, we could create a new scoring system that could advance our understanding of sleep architecture.In other words, our goal should be expanding our current understanding of sleep architecture rather than implementing more advanced analysis techniques to only track the main three sleep states.For this, it would be critical to use data-driven approaches to distinguish the different microstates, but it would be also critical to then relate these to the three main sleep stages (W, NREM and REM) as well as substages known from humans.We should also not remain restricted by the states defined by Rechtschaffen and Kales, because the classification based on visual criteria in 30-s epochs was created in the precomputing area with analogue recording systems and may be lumping different states into one.An ideal automated system requires the confidence of the researcher and should take into account the variability that might arise from genetic background of the animal, individual variability, changes in EEG power and frequency due to electrode position, as well as different disease models.But to achieve this ideal automated system, it will also be important to create consensus-based definitions and criteria for rodents as we have for humans.
Furthermore, the development of a reliable online automated scoring system would be important in sleep deprivation studies.In fact, Libourel et al. (2015) devised an unsupervised Bayesian classification algorithm that was able to identify the three main sleep states and meanwhile was connected to another system to selectively interfere with REM sleep.The online identification of different sleep states and the deprivation of specific state would expand our understanding of the contribution of various sleep states to memory consolidation.
Finally, sleep classification depends extensively on computing the power spectra in arbitrarily predefined frequency bands.The boundaries of these frequency bands are not only different between humans and rodents, for example, theta oscillations (Jacobs, 2014), rather they are different between various studies (see Table 1).Therefore, the boundaries of each specific frequency band need to be included and clarified in each study.

| ELECTRODE PLACEMENT
Another important issue affecting sleep classification is electrode placement.Human sleep scoring relies on cortical surface EEG, whereas in rodents, there is no standard for electrode placement as we will discuss next.
Rechtschaffen and Kales criteria demand the use of central electrodes for EEG (often parietal cortex), EMG chin electrodes and EOG electrodes.Because slow oscillations originate from frontal brain areas and are more visible there, and the wake alpha oscillation is more dominant in occipital areas, the new AASM criteria require in addition to the parietal electrode, frontal and occipital electrodes for scoring.In rodents, the frontal electrodes are essential for identifying slow delta oscillations and anterior spindles, whereas the occipital and hippocampal electrodes are critical for distinguishing theta oscillations.Indeed, in rodents with smaller brains and no possibility of using noninvasive and glued electrodes; frontal and parietal or occipital electrodes are common, whereas EMG and EOG electrodes are less commonly applied (see Table 1 for further details about electrode placement in different rodents' sleep studies).
Instead, the arousal state is determined by video classification or accelerometer signal that is used as an EMG replacement (see Lima et al., 2017).Further, and unlike humans, many researchers use several invasive implantation methods/depth electrodes in rodents to uncover the neural mechanisms underlying sleep and memory.This allows researchers to record the local activity within single brain areas, unlike cortical electrodes-in humans or rodents-which measure the sum of the neural activity over several brain areas.These differences in electrode usage and placement will affect sleep scoring and which features can be used for microstate classification.Considering which signal is used for sleep classification is of special importance when only a single brain area is recorded.For example, most of the sleep stages are classified mainly within the low-frequency range (<20 Hz) that differs along the anteroposterior axes of the brain as mentioned above.Thus, slightly different electrode positions can lead to differences in sleep classification especially with automated systems (Fang et al., 2009).In other words, in sleep studies, which implant only a single electrode in the hippocampus, for example, Winson (1976) and Costa-Miserachs et al. (2003), it is not possible to accurately examine the delta spectral dynamics because they are stronger on the frontal electrodes.Furthermore, the identification of transition sleep depends on the persistence of slow oscillations and spindles in the frontal electrodes associated with the appearance of theta oscillations on the posterior electrodes.Thus, targeting one single brain area for sleep classification might conceal the rich dynamics of rodents' sleep resulting in either misclassification of specific states or inaccurate identification of the beginning of each state.
Another rising challenge is the modern view that sleep is more local and less of a global phenomenon (Genzel & Dresler, 2012).Depending on their usage during the preceding day, some brain areas will show a local upregulation of slow wave activity (Huber et al., 2004;Siclari & Tononi, 2017;Vyazovskiy et al., 2002).This can potentially affect both human and rodent sleep scoring, because some brain areas might precede the others in showing the NREM or REM state.More importantly, the placement of depth electrodes in rodents, for example, hippocampus, enhances the spatial resolution of the recorded local field potential signal and thus improve the observation of changes of the local oscillatory dynamics in single brain areas separately.Accordingly, it is not clear how we should score sleep states when one brain area shows signs of one sleep state, whereas another brain area is in a different state.For example, REM sleep tends to start in the hippocampus, and it takes a few seconds for the characteristic theta oscillation to be visible in the parietal cortex and then another few seconds for it to travel forward to the frontal cortex.During NREM sleep in rodents, for example, one can observe the emergence of slow oscillations (delta waves) in the cortex but not in the hippocampus (Figure 4).However, after intense learning experiences, animals exhibit larger and denser slow oscillations in the cortex.Furthermore, these oscillations occasionally become even larger, and additional coordinated slow oscillations concomitantly occur in the hippocampus (Figure 4).Because all standards methods in sleep classification were developed with the human surface EEG in mind, such local phenomena as well as how many brain regions are synchronized during a particular state were not considered in these criteria.
In sum, the usage of electrodes, deep in the brain in rodents has highlighted the shortcoming of current sleep stage classifications even in humans.We define sleep states by occurrence of specific oscillations in the cortex; however, this does not account for the more diverse picture where different activity in deep brain areas may accompany the same cortical EEG traces.

| CONCLUSION
Rechtschaffen and Kales manual on sleep scoring is 54 years after its first publication and is still the standard of sleep research.Surprisingly, after half a century of moving from visual scoring on paper to automatic systems with hardware, software and now machine learning solutions, we seemed to have not advanced much in our classification systems.Human research still relies on fixed 30-s epochs and four (or five) sleep stages, whereas rodent research is stuck at three stages.Perhaps it is time now to move on and rethink how we classify sleep.Especially data-driven approaches have the potential to expand how we consider sleep states.But if new classification systems or microstates are defined, it will be of utmost importance that new states are always still described in their relation to the known, traditional sleep states.Furthermore, it is important that these states are shown in multiple data sets, ideally stemming from different species.Else we run the risk of each sleep study remaining an isolated island.
When redefining sleep and sleep states, there are multiple outstanding questions and issues that need to be considered: • Should we keep 10-s/30-s epochs (rodent and human, respectively)?Sleep states show gradual transitions (e.g., gradual transition from S1 to S4 and REM) and do not occur in fixed intervals, instead it's a continuous process.Should not sleep scoring reflect this?-How should one consider whole brain synchronization and different brain areas being in different sleep states at the same time during sleep scoring?Should brain area involvement be a sleep stage criterion?• How should we deal with mixed states where the spectral profile is not clearly one or the other state?• In rodents, there are multiple candidates for light sleep, highlighting the potential diversity of this state.Should we subclassify this state in humans?• When defining sleep, it will be also important to define wake states.There are no standardized criteria for this in humans or rodents, with some studies distinguishing active (theta dominated) wake from QW.However, QW will contain grooming, chewing, resting (head up, not moving), drowsiness/ Stage 1 sleep (head resting) and microarousals (mini wake in the middle of sleep), just to name a few possibilities.Each wake substate will potentially have a different function.It will be critical to resolve these questions and create a standardized framework of mammalian sleep.Only once we have this can we even consider to apply it to other species (see Box 1).

Box: Other Animals
A large proportion of the arguments set out for human versus rodent sleep scoring, whether it be manual or automated, are not only in need to be resolved for the sake of comparison between mammalian species but also to aid comparative research on sleep in non-mammalian species.With the technical advances of the last decades, sleep research has spread out over a large variety of mammalian and non-mammalian species (e.g., (Nath et al., 2017;Zaid et al., 2022)) and is more often examined in the wild, which comes with its own advances and limitations (Rattenborg et al., 2017).This expansion of model species for sleep has not only provided new light onto the evolution of sleep but has also shown that sleep is not easily divided into the same (sub)states in all species.Next to mammals, birds are most often used as study species for sleep.Although birds show similar sleep states (i.e., NREM and REM sleep) to mammals, there are several differences (e.g., short sleep bout length, lack of substates in NREM sleep and the absence of certain sleep-related brain oscillations (Rattenborg et al., 2019;van der Meij et al., 2019)), which make it difficult to automate sleep scoring in these animals.Automated sleep scoring becomes even more problematic when it comes to species of which the precise sleep architecture is yet unclear, as in the case of reptiles where electrophysiological evidence is variable.
For instance, although one study reported evidence for a NREM-and REM-like sleep state (Shein-Idelson et al., 2016) in the bearded dragon (Pogona vitticeps), another study examining sleep in a closely related species painted somewhat of a different story (Libourel et al., 2018).Even though both studies report on finding two different electrophysiological sleep states during behavioural sleep, neither of these states completely matches with the NREM and REM sleep states classically observed in mammals (Libourel & Barrillot, 2020).So, with every new species examined, the once thought clear separation of sleep into two distinct states becomes more and more challenged.Meanwhile, the urge for finding commonalities and the need to make research results comparable across species actually often results in the misuse of nomenclature, lack of openness towards potentially intriguing differences and in the end, the use of many different analytical approaches, which might cause more confusion than insight.To be able to broaden our scope on sleep using a large variety of species, we will thus need to first create a unified framework of sleep in mammals.Only when we are able to reach consensus there, a realistic step in comparison with and between different non-mammalian species can be made.
Shown are different behaviours seen in the task and sleeping environment (unpublished pictures).

F
I G U R E 4 Different slow oscillations states in non-REM.On top an example of 'normal' non-REM, which scale is adapted for the later traces.In the box, amplitude is magnified by four, and in the box, the scaling is as in Figure2.Below are examples of very large cortical slow oscillations, without or with synchronized hippocampal slow oscillations.Shown is unpublished data.