Performance analysis of machine learning algorithms on automated sleep staging feature sets

With the speeding up of social activities, rapid changes in lifestyles, and an increase in the pressure in professional fields, people are suffering from several types of sleep ‐ related disorders. It is a very tedious task for clinicians to monitor the entire sleep durations of the subjects and analyse the sleep staging in traditional and manual laboratory environmental methods. For the purpose of accurate diagnosis of different sleep disorders, we have considered the automated analysis of sleep epochs, which were collected from the subjects during sleep time. The complete process of an automated approach of sleep stages’ classification is majorly executed through four steps: pre ‐ processing the raw signals, feature extraction, feature selection, and classification. In this study, we have extracted 12 statistical properties from input signals. The proposed models are tested in three different combinations of features sets. In the first experiment, the feature set contained all the 12 features. The second and third experiments were conducted with the nine and five best features. The patient records come from the ISRUC ‐ Sleep database. The highest classification accuracy was achieved for sleep staging through combinations with the five feature set. From the categories of the subjects,


| INTRODUCTION
The analysis and study of human brain behaviour can be very important in different health sector applications such as in diagnosis of mental and neuro-disorder diseases and abnormalities. The activities of the human brain have been analysed through recordings of the EEG signal. The major role of EEG is to assist the physicians for accurate diagnosis of various diseases from recorded behaviour of the subjects which assist the sleep experts in distinguishing the sleep irregularities and prediction of epileptic seizures etc. [1] Besides this, sleeprelated disorders are currently the major global challenge across the globe because a proper sleep can decide the major balance between our physical and mental health, as it directly affects the human quality of life and its bad consequences might reflect in our body [2]. Therefore the sleep study gives more insights into various health risks and issues such as memory impairment, diabetes, and cardiovascular diseases [3]. Henceforth, it is more vital for analysis of possible symptoms of sleep problems, and a sleep staging score is necessary for almost all conditions.
The procedure of sleep scoring is to identify the irregularities that happen with the help of polysomnography recordings from the patients during sleep hours. Generally, polysomnography is one of the multi-parameter tests and it helps in the analysis and interpretation of multiple and simultaneous activities happening in the body during sleep through recordings of different electrophysiological signals. During the treatment, different physiological signals have been recorded from different parts of the body. The brain behaviour information is collected through recordings of the electroencephalogram (EEG) information; similarly the chin, left and right limb activities are recorded using the signal from electromyogram (EMG), the behaviour of eye movements is acquired through the electrooculogram (EOG) signal recording, and the heart rhythm information is collected using the electrocardiogram (ECG) signal along with respiratory signals, airflow signals, and oxygen saturation [4]. The sleep experts measured all these recordings and analyses through standard sleep manuals named as R&K rules, which were documented by two scientists namely Rechtschaffen and kales [5]. As per the R&K rules, whole sleep is classified into three stages-awake(W),non-rapid eye movement(NREM) and rapid eye movement (REM).Furthermore, NREM sleep stages are divided into N1-N4 sub sleep stages.N1 and N2 were considered as light sleep, similarly N3 and N4 as deep sleep. According to the R&K rules, each epoch length is the 30-s data. Furthermore, another sleep standard was published in the year 2007 named as AASM with minor revision to the earlier R&K guidelines [6]. According to AASM, the whole sleep was segmented into five sleep stages. The only changes observed between AASM and the R&K rule are the total number of sleep stages. In AASM, the N-REM stage3 and N-REM stage4 are combined into one stage called the N-REM stage3; another sleep stage was the same as that in the R&K rules. From all the electrophysiological signals, only EEG signals are used mostly subjected to the treatment and diagnosis of possible sleep-related diseases because they directly provide the information on brain activities during sleep. Another major advantage of EEG signals is that they are characterized by individual sleep stages through different EEG waveforms such as alpha, theta, beta, delta, and gamma [7]. The brain-behaviour is recorded from the scalp by attaching the electrodes, which mainly track the changes in voltage differences between the different neurons. The placement of electrodes is managed according to the international 10-20 EEG electrode placement rules. The concept of 10-20 refers to the placement of the electrodes which were either 10% and 20% right-left or front-back of the scalp [8]. Nowadays for the diagnosis of different types of sleep disorder, EEG signal is preferable in the sleep laboratory for monitoring the sleep quality in the real-time environment and sometimes it may also used in a portable manner at home so that one can easily measure the sleep quality [9].
The wake stage is considered a relaxed state where one subject can prepare himself or herself for sleeping. After sometime the body enters the N-REM stage1 and from this step, a sleep cycle begins; it is a transition phase between being awake and asleep. The average duration of this state is 5 to 15 min in general and the brain normally produces theta waves [10]. In N-REM stage2 is somewhat a deeper sleep ompared to stage1, where the heart rhythm and body temperature reduce and sleep spindles and kcomplexes also occur [11]. In the N-REM stage3, the actual deep sleep starts, and the brain behaves very slow and similar conditions continue for the N-REM stage4, wherein delta waves are produced from the brain. Finally, in the REM stage, small brain waves are seen and the blood pressure may suddenly increase with irregular breathing with rapid eye movements occurring in this phase of sleep. This process continues cyclically throughout the whole night from NREM stages to the REM stage and each cycle duration approximately lasts for 90 min [12]. One healthy sleep covers normally three to five sleep cycles for subjects. In general, the recordings of the EEG signals are highly in complicated in nature and the characteristics of these signals continuously fluctuate with regard to the amplitude, phase, and frequency. To extract meaningful data from raw EEG signals, we need to analyse the extended signal periods [13]. Another aspect of difficulty also observed in terms of EEG recording is that generally recorded hours of sleep data is quite difficult to analyse within the 5-10 s time window [14].
Sometimes, it has observed from different sleep studies that the multi-channel EEG approach raised some limitations towards the evaluation of irregularities due to disturbances found on the subject's health. Therefore most of the sleep researchers used a single-channel of EEG for classifying the sleep stages.
Traditionally, the technicians have analysed and interpreted the sleep stages manually, alternatively, it has raised the set of limitations which are related to manage the huge amount of sleep EEG records and it is also a timeconsuming, highly expensive process and finally, this manual interpretation of sleep records is a human-dependent process [15]. As an outcome, it has required to develop an automated sleep stage classification system to get better classification accuracy.
In the present research on sleep staging, we propose an automated sleep staging system based on single-channel EEG signals from the subject with different health conditions. The proposed research completed majorly in five stages such as signal preprocessing, feature extraction, feature screening, classification algorithm, and finally the comparative analysis in between was from different categories of subjects.
In our proposed sleep study, we have considered three different category of feature combinations; in one category we have considered 12 features, the second category, we obtained the best nine features from the extracted feature vector and finally, in the third category we obtained the best five features. The novelty of this research work is the analysis of the sleep behaviour of the subjects through the statistically significant difference among the sleep states.
The rest of the paper is organized as follows: Section 2 describes the detailed on the existing contributions. The experimental data preparation is explained in Section 3. Section 4 described full description on the methodologies including data preprocessing, feature extraction, and feature selection, classification and model performance evaluation. Section 5 discusses the obtained experimental results of the proposed methodology from three different combinations of feature sets. Section 6 ends with the concluding remarks and the future work description.

| RELATED WORK
Recently, different sleep studies have been proposed by different authors which were subjected to the diagnosis of sleep-related disorders. Previously contributed different sleep staging was based on the EEG recordings and different signal analysis techniques. It has found that the automatic sleep stage classification is a challenging task with subject to proper managing the EEG signal processing in different levels, segmentation of the sleep epochs, extracting the features, screening the features and finally the selection of suitable classification models. In majority of our literature survey, we have focused on the consideration of the EEG signal, different types of features extracted and classification models obtained by different authors during the sleep irregularities analysis. Here, we described some of the existing related works with regard to sleep scoring in which different methodologies were obtained by different researchers.
Flexer et al. obtained a hidden Markov model for classifying three stages of sleep and achieved an overall classification accuracy of 80% [28].
Berthier et al. proposed two to five sleep stage classifications with the input of EEG signals through the fuzzy-logic iterative methods and the accuracy was reported as 82.9% [29].
Chapotot and Becq [30] proposed the classification of six different sleep stages (wake, S1, S2, S3, S4, and REM) using a hybrid model in the combination of decision rules and knearest neighbor (KNN). The model achieved an overall accuracy of 78%.
Sinha presented the sleep study with consideration of the three-state sleep stage scoring and obtained wavelet transform techniques for signal segmentation and used ANN techniques for classification and reported the classification accuracy as 95.55% [31].
Ebrahimi et al. obtained the wavelet concepts for discriminating the different sleep stages and also used an ANN classifier reported accuracy rate as 93% [32].
Ronzhina et al. conducted the sleep study with consideration of power spectral density features and used the ANN classifier for six-state sleep stage classification and the accuracy was reported as 82.9% [33].
Jo et al. proposed a four-state sleep stage classification and obtained a genetic-fuzzy classifier for classification and finally, the model achieved 84.6% overall classification accuracy [34].
Hsu et al. proposed a sleep stage classification with the input of sleep EEG record and extracted energy characteristics for distinguishing the sleep stages. The accuracy result reported for the five-state sleep stage classification was 87.2% using a neural network model [35].
Fraiwan et al. extracted time-frequency entropy features to represent the sleep records and used a linear discriminate analysis algorithm for classifying the sleep stages and the achieved an accuracy result with six-state sleep stages as 84% [11].
Eduardo T. Braun presented an efficient and effective sleep staging scoring system and obtained frequency domain properties. They obtained random forest (RF) techniques and the achieved results for a classification model as 90.9%, 91.8%, 92.4%, 94.3%, and 97.1% for 2-6 sleep stages respectively [36].
In [37] the author has considered the graph theory concept for distinguishing the sleep stages by implementing the concepts of difference visibility graph and visibility graph from a single channel of EEG signal. The sleep-EDF dataset used for this research work obtained the support vector machine classifier for classifying the 2-6 sleep stage classification and the model achieved an average accuracy of 87.5%.
S.-F. Liang has considered sleep recordings from the public Sleep-EDF database and used linear discriminate analysis classifier for classification and achieved the accuracy result as 83.6% [2].
In [38] the sleep study has performed with EEG signals of the Sleep-EDF dataset, and the input signal was segmented into five sub-bands such as delta, theta, alpha, beta, and gamma through butter-worth band-pass filter techniques. Energy, standard deviation, and entropy features were extracted from the respective five sub-bands. Here, the author obtained an SVM classifier for the two-state sleep stage classification and achieved a success rate of 92%.
Hassan et al. presented six different studies related to sleep stage scoring using the EEG signal. In their first work [39], he extracted statistical moment features from a single channel of EEG signals and used a decision tree (DT) classification algorithm for classification. The reported accuracy result for the respective 6-2 sleep stages is 86.8%, 90.6%, 92.1%, 94.1%, and 99.4%.
In [40] the author presented another novel method towards the classification of sleep stages by introducing the empirical mode decomposition for decomposing the EEG signal into different segments. The extracted set of statistical features is forwarded into an adaptive boosting classifier. The model reported an overall accuracy for 6-2 sleep stages is 88.62%, 90.11%, 91.2%, 93.55%, and 97.73%.
In [41] the author has conducted sleep studies by introducing the empirical mode decomposition for signal decomposition into different frequency sub-bands. The whole experiments were conducted with Sleep-EDF sleep EEG records and selected features are forwarded into a Bagged classification tree and the classification model achieved an overall accuracy for 6-2 state sleep stages that are indicated as 86.89%, 90.69%, 92.14%, 94.10%, and 99.48%.
In [43], the author has implemented the same techniques as TQWT for signal segmentation and extracted normal inverse Gaussian features from respective signal sub-bands and used AdaBoost classification techniques for classification and the classification accuracy performance reported was 90.01%, 91.36%, 92.46%, 94.83% and 98.01% for 6-2 sleep state classifications.
In [44] the author used two bench-mark sleep dataset named as Sleep-EDF and Dream dataset, same TQWT techniques used for signal segmentation and bagging classification techniques and the accuracy result reported for 2-6 sleep state classification as 92.43%, 93.69%, 94.36%, 96.55%, and 99.75%.
Subasi et al. has proposed three sleep stages (alert, drowsy, sleep) using the EEG signal and obtained a wavelet transform concept for signal decomposition. The high classification accuracy achieved with the ANN classification algorithm and the classification model reported accuracy for alert, drowsy, and sleep were 92.3%, 96.2%, and 93.6% [2].

| Experimental data
In this research work, we have used sleep recordings that were obtained from the ISRUC-Sleep public repository, which was directly prepared by sleep experts at Hospitalar and University of Coimbra [14]. This dataset was exclusively prepared for sleep-related analysis . In this dataset, three different subgroups of data were contained which was obtained from different medical conditioned subjects, from subgroup-I section, total 100 subjects one session recordings details were contained, similarly in a subgroup-II section, data of eight subjects were contained with two recording session per subjects and finally in a subgroup-III, data of 10 healthy subjects were collected as one session recording from each subject. All these recordings were recorded from subjects through a whole-night that include six EEG channels, two EOG channels and three EMG channels. The acquired signal recordings were performed with a sampling rate of 200 Hz and these recordings were obtained through a 10-20 international standard electrode placement system. In the present study, the major focus to analyse the sleep irregularities was through proper sleep stage classifications with the input of a single-channel consisting of C3-A2 of EEG signals. The C3-A2 channel was selected in most recent sleep studies and several studies [2,41,[45][46][47][48][49][50][51][52][53][54][55][56][57][58][59][60] were achieved higher classification accuracies with the input of the C3-A2 channel.
In this study, we have used two different subgroups data from the ISRUC-sleep dataset, where four subjects are from the ISRUC-sleep subgroup-I, considering that the subjects were affected with different sleep problems, whereas other four data obtained from the ISRUC-sleep subgroup-III section, where the subjects were completely healthy controls. In this work, we have obtained eight polysomnography (4 subjects .1PSG + 4 subjects. 1PSG = 8 PSGs) records. Tables 1 and 2 presented the different sleep stages epochs from sleep disordered subject and subjects were completely healthy controls.

| Manual methods of sleep scoring
Each polysomnography recordings is associated with the corresponding annotation file, which are prepared according to AASM rules through two well-trained physicians which were labelled as the epochs in any of the sleep stage. Each epoch of the recordings is segmented into 30 s epochs.

| Data preparation
The recorded overnight PSG recordings were divided into 30-s long epochs. From each subject, a rectangular length of 6000 samples (=30 s � 200 Hz) of EEG signal segments are extracted. For each subject we prepared individual datasets D = {D1, D2, D3, D4, D5, D6, D7, D8} with dimensions i x j where i refers to total epochs and j points to the length of the sample points. Let Dmn, i = 1,2,3……,m, j = 1,2,3…….,n be the element of the data matrix D. In this step, the length of j is 6000 samples(30 s � Fs = 6000).

| PROPOSED METHODOLOGY
The complete research work layout of this proposed study is presented in Figure 1. It can be observed that the whole experiment is implemented through different phases. The first phase is carried out with proper signal preprocessing; the second phase with subject for analysis of characteristics of features, the third phase by selecting the suitable features through the feature selection algorithm; and finally the last phase is carried out with a classification task. We elaborately described each module in this section.

| Signal preprocessing
The extracted and recorded EEG signals from subjects during sleep hours are contained in different types of artefacts and noise. In general, the signals are always in the form of a mixture of some noise. On further processing of the input signal for analysis of the irregularities, first of all, we need to reduce the irrelevant noise signal which is mixed with raw signals. In this proposed study, we have obtained two techniques to eliminate noise and artefacts. The general form of the recorded signals is where Es represents the original EEG signal and Ns represents the noise signal. Our main objective is to reduce the noise from Es as close as we could so that we finally obtain an undistorted brain signal information. For this, we have obtained the principal component analysis, which supports reducing the noise problems with the concept of decorrelating the correlated features by identifying the linear relationship between different observations. Similarly, sometimes during recording, the brain signals from subjects, some muscle artefacts, and eye movements occurred and these occurrences are included in the actual brain recordings and it may create sometimes a problem to identify the actual diagnosis. So to eliminate these factors, we have obtained a bandpass Butterworth filter method with a lower cut-off frequency of 0.5 Hz and a higher cut-off frequency of 45 Hz. However, the EEG signal consists of the different characteristics of the wave patterns with different frequency ranges. Generally EEG signals are composed of alpha (α), theta (θ), delta (δ), beta (β), sawtooth, sigma (σ), and K-complex characteristic waves. These wave forms appeared during the different stages of sleep. Hence it is important to analyse the different wave forms during the sleep stage classification for identifying the irregularities in the sleep patterns. The characteristics of the waveform with regard to the different stages of sleep for different frequency ranges are shown in Table 3.

| Feature extraction
It is one of the important steps for the sleep stage classification because if the suitable features are not selected well then it may put an impact on the performances of the classification task. So it is even more important to design an effective system to extract the features from EEG signals because the EEG signals are like the non-stationary time series data and consist of many numbers of segments that are themselves stationary. In this proposed study, the input EEG signals are segmented into 30 s (6000 sample points).
In the proposed sleep staging study, we obtained the statistical approach to extract the time domain features from the input signal. Various statistical parameters were used for measuring the features with the correlated sub-bands. The statistical parameters: mean, standard deviation, minimum, maximum, median, variance, skewness, and kurtosis are considered for the derivation of the sleep behaviour of the patients related to the individual sleep stages. It has been observed that among the various statistical parameters of the EEG signal the first-to-fourth order moments such as mean (first raw moment M1), variance (second central moment M2), skewness (normalized third central moment M3), and kurtosis (normalized fourth central moment M4) were computed from the 30-s epochs of the EEG signal to measure the central tendency, degree of dispersion, asymmetry, and peakedness respectively. The variance (M2) helps to interpret the sleep behaviour in the REM sleep stages from the NREM N2 and N3 stages. The third quartile (Q3) helps find the values below 75% of the random variable values which are identified. This feature was also obtained in the different existing studies for discriminating the characteristics of the sleep stages. A zero crossing rate (ZCR) provides count information with regard to the number of times the EEG signal crosses the relevance line which is obtained from the mean value. This feature is quite suitable for the characterization of the sleep spindles and it also helps to analyse the sleep stage activities from the EEG signal. For analysis as a whole, we considered 12 features as key features to represent the sleep behaviour EEG records in this work such as {mean, mode, median, standard deviation, maximum, minimum, range, variance, third quartile, ZCR, skewness, and kurtosis}.The mathematical expression and short explanation about the extracted features are presented in Table 4.
In this study, we have conducted three different experiments based on three different sets of feature vectors. First of all, we include all the 12 features in the process of classification and result from two sets we considered with the best weightage nine and five features respectively. The proposed feature extraction approaches are shown in Figure 3

| Feature selection
It is one of the basic important sections in the machine learning problem; basically its supports for better analysis of any pattern classification-related task. The major advantage of the feature selection is to reduce the complexity and increase the performance of the model. It has been observed from different studies that the selection of appropriate features is a major obstacle generally in the learning problems. Sometimes, it is observed that the original features are not always well-performing for classification tests. During analysis, some features may not be effective for the reason of being irrelevant, redundant and noisy, and sometimes all this nature of features reduces the performance level. Therefore feature selection is a necessity for the task of classification and it indirectly reduces the computation cost [65]. In this study, we used the ReliefF weight feature selection algorithm for screening the relevant features which directly help to recognize the irregularities of the sleep pattern. ReliefF is one of the supervised features of the weighting selection algorithm. It helps to analyse the extent to which features are most useful for discriminating between different stages and measures the effectiveness. The major advantage of this algorithm is to deal with unknown and redundant data [66]. This algorithm is computed using this mathematical expression.
Equation (2) formulated to find out the quality weight W T (i) for each feature vector x k , k = 1, 2, 3……F and it calculates the nearest miss x m and nearest hit x h for randomly selected features m.

| Decision tree classifier
It is one of the structured methods and comprehends classification techniques incomparable to another classification algorithm. DT in majority is used by different types of classification tasks [67] and the major cause behind this is its simplicity and ease of understanding the rules regarding the tree structures. A decision tree is constructed from a considered training dataset and each sample of the dataset is contained in the feature values and its class labels. Generally, DT is working like an inductive inference. The major advantage with DT is, it can deal with noisy data and missing data in the dataset. It is also used for multiple stages and consecutive approach during the classification procedure. During the first step of classification, the tree is generated, after which the data is applied one by one to the classification process. Each node in the decision tree is represented as testing features belong to the training set and the generated branch from this node is the proper value of the feature. There are so many algorithms which are designed in connection to DT, but some algorithms are more accepted a subject to different types of classification applications. Some of the most used algorithms in the literature are C4.5, ID3, and C5. SATAPATHY ET AL.

| K-nearest neighbor classifier
It is one of the mature and simplest theoretically models incomparable to other machine learning classifiers [68]. The main important working style of KNN is to find a similar characteristic between the samples by measuring the distance. It is most acceptable in the case of multi-modal distribution data. It is difficult to decide the boundaries for the different classifiers when the same samples of a certain class are scattered modularly in different locations of the feature vector. But it can manage with the KNN algorithm; it assigns a label to each an input data, and this process is managed by computing the majority of vote of its k-nearest sample points [69]. The major measurement indicator of KNN is to compute the distance between objects located in the feature space, to measure the distance; in general, two mathematical formula are used named as the Euclidean distance and Manhattan distance.

| Random forest
This algorithm is proposed by Breimant and is one of the popular classification techniques that use multiple tree structures for training the data and predicting the samples [70].  [60] With N = the length of the data sample x and � x is the mean of the data sample The mean electrical potential of an epoch is calculated. It also measures the central tendency in the data points. 2 It is used to quantify the range of data and it helps to find the magnitude of the signal baseline. 3 [61] With N = the length of the data sample x and � x is the mean of the data sample It helps to determine how the data is dispersed with respect to the value of the mean. 5 It helps to measure how diversely the dataset spreads out. It computes the difference between the maximum and minimum values, and from this we can get an estimate of the spread of the data. � th [62] With N = the length of the data sample x It helps to get the information about the centre and spread of the signal data.

9
Q3 ¼ ðN þ 1Þ 3 4 [63] With N = the length of the data sample x The quartile analysis provides amplitude information of the signal, which helps to discriminate the sleep stages. It defines the value below which 75% of the random variables values' data is located.

10
Signal Skewness ðskewÞ ¼ P N i¼1 ðEðxi −� xÞ 3 Þ σ 3 [41] Where N is the length of the signal data x i σ is the standard deviation of the sample data for all iE is the expected mean valueEðxÞ ¼ ∑ N i¼1 p i x i · p i presents the probabilities with associated to the signal data x i The skewness helps to measure the symmetry of the signals distribution with respect to the mean value. The normal distribution of the signal is zero, while the positive and negative skewness indicate that the data are skewed into the right and left hand sides. It is a higher order statistics measure(third moment)

11
Signal kurtosis ðkurtÞ ¼ P N i¼1 ðEðxi−� xÞ 4 Þ σ 4 [64] Where N is the length of the signal data x i It measures whether the data is peaked or flat relative to the normal distribution. It is a higher order statistics measure (fourth moment) Each tree requires randomly sampled data values and separate classifiers. The major difference between RF and other classification techniques is that the input is selected in a random manner using bootstrap selection methods. This whole method continues till the noisy and outlier samples do not desensitize and at last, the output is computed by the voting approaches.

| Model performance evaluation
To validate the proposed model performance which is subjected to the sleep stage classification system, we obtained five evaluation metrics, including, sensitivity [71], specificity [35], precision [72], classification accuracy [73] and F1 Score [74]. The mathematical expression of these statistical indices is defined in Equations (3)-(7). Apart from above all the evaluation metrics, additionally, we also include the k-cross-validation techniques in our experiment. It is one of the popular measurements in the problem type of the pattern recognition. It is used most of the time in the classification method by dividing the correctly identified classification results by the total number of cases. In this procedure the whole dataset is divided into k subsets, out of which one subset is considered for the testing portion, while others are considered as training subsets. In this work, we have used the 10fold cross-validation techniques upon the considered dataset. Accuracy where TP, TN, FP, and FN are true positive, true negative, false positive, and false negative, respectively.

| RESULT ANALYSIS AND DISCUSSION
To measure the performance of the proposed system, we designed a series of experiments using datasets described in F I G U R E 3 Structural framework of the statistical feature extraction. EEG, electroencephalogram SATAPATHY ET AL.
-163 Section 3. Here it is already mentioned that we have validated the model with the help of different number of statistical features. The whole experiment obtained three different approaches to verify the performance of the model. The first experiment was conducted considering the total extracted features. Similarly, in the second and third approaches, we use the best 9 and 6 features through a feature selection algorithm and considered input for the classification model. All these three approaches are applied to each subject enrolled for this research work. The whole experiment was implemented through the Matlab (Version 2017b) on a computer with memory of 8 GB RAM, 3.40 GHz Intel(R) Core (TM) i7 CPU processor machine. The proposed sleep study was conducted with one dataset and two different category subject; one category of subjects were affected with sleep-related disorders and the other one completely healthy with no prior controlled medication. We considered the single channel C3-A2 of the EEG signal as an input channel for the proposed model. The sample size of each subject was the same, that is, 750 epochs and the length of each epoch is the 30 s. The sampling frequency is 200 Hz. So for individual subjects, the input size is 750 x 6000.
To analyse the sleep characteristics of each subject and its irregularities which occurred during sleep, we extracted timeseries information from the recorded signals. In this work, two different categories of subjects were having symptoms of sleep-related disorders and other categories of subjects belong to completely healthy controls, where we did not find any type of sleep-related problems. In this work, the major objective is to determine which set of combination features gives the best results for sleep stage classification accuracy.

| Set of twelve features
First the experiment is carried out with consideration of 12 features {X ME ,X VAR , X MAX ,X MIN , X RAN ,X ME ,X MODE ,X SD , X ZCR ,X SK ,X KU ,X TQ },which were used for classify the sleep stages. From each feature, 750 segments were extracted. Table 5 (A) presented the confusion matrix for sleep-disordered subjects and Table 5 (B) presented for healthy control subjects.
The first experimental results achieved with the help of three classification techniques obtained with the consideration of 12 features are presented in Figure 4a-c for sleep-disordered subjects, similarly, in Figure 5a-c for healthy control subjects. The most-reported results with the input of the category-I subjects\ who performed best with random forest classification techniques and an overall accuracy resulted from 91.60%, 84.13%, 86.26%, 88.93% for subject-15,16,19 and 23 respectively. Similarly, the performance achieved for healthy control subjects were best with the ensemble classifier, and it has been observed that accuracies for subject-1, subject-5, and subject-8 exceeded 90%. The sensitivity of subject-15 achieved the best and is incomparable to other subjects; the results reported for DT, KNN, and RF are 90.20%, 96.86%, and 94.71% respectively. In the case of healthy subjects, the sensitivity of subject-01 is performed best for all three classifiers. The reported results are 89.10%, 89.10%, and 94.46% for DT, KNN, and RF respectively T A B L E 5 Confusion matrix presentation for two sleep stages based on 12 features (A) Category-I subjects, (B) Category-II subjects

| Set of Nine Features
The second experiment is based on the top nine features, which are selected through the ReliefF feature selection algorithm. The selected top nine features are {X ME , X MIN , X MODE , X ME , X MAX , X RAN , X TQ , X SD , X VAR }. From each segment, 10 � 750 epochs were extracted. The confusion matrix for both category subjects based on the nine features is presented in Table 6 (A) and (B). It can be reported from the experiment that, the achieved performance from both categories of subjects degraded when the features' number reduced incomparably to the first experiment. The performance results obtained from category-I subjects are presented in Figure 6 and similarly, the results for normal healthy control subjects are presented in Figure 7  above 90% for all classifiers and similarly for Category-II subjects, the sensitivity of the subject-1 was reported to be above 90% with the RF classifier.

| Set of five features
The third experiment conducted with a set of five features to both category subjects. The five features were selected based on the weightage of features which was generated from the ReliefF feature selection algorithm. The selected five features are {X ME , X MIN , X MODE , X ME , and X MAX }. The vector representation of each segment was represented as 750 � 6. Confusion matrix for category-I subjects are shown in Table 7 (A) and for category-II subjects are shown in Table 7 (B). The achieved statistical results are shown in Figure 8a-c for subjects who were affected with sleep-related disorders and Figure 9a-c for healthy subjects.
It has been seen that a combination of the best five features is the most effective for the classification of sleep stages for both categories of subjects. The results reported for the subjects who were affected with the sleep disorder reached above 90% for each obtained classifier. The Subject-23 achieved the highest classification accuracy of 93.42%, 93.60%,

| Comparative analysis
We used two performance comparison analyses for determining the analysis outcome of the proposed model, . The first analysis was compared between the proposed classification results with the ensemble (Bagging) classification performance. The second analysis was completely based on the already existing research works with similar channels or similar datasets ( Figure 10).

| Comparison with the ensemble (bagging) classifier
In this section, we have made a comparative analysis in terms of the classification accuracy. The comparison between the ensemble classifier and the results is achieved from the proposed model with a different combination of statistical features. The same combinations of feature sets were used during comparative analysis. It was noticed that the accuracy achieved through the proposed model was better than the ensemble classifier for five combinations of feature sets for category-I subjects, who were affected by the different types of sleeprelated disorders. Similarly, for category-2 subjects, who were completely healthy, the proposed model achieved a better performance incomparable to the ensemble classification performance with 12 feature combinations. Figures 10a-d and 11a-d represent the comparative accuracy results with all the three feature combinations for both category-I and category-II subjects.

| Performances comparisons with the existing contributed classification sleep stage methods
The results achieved in the current research work are compared in between different contributed state-of-art literature, in which the obtained input channel is EEG, two stage classification, statistical features, and datasets are addressed. In Table 8, the features used in the proposed research work are compared with the others used by related works using singlechannel EEG signal of the ISRUC-Sleep dataset. Comparisons with other similar research proposals are available in the literature which must taken into consideration during the use of the single-channel EEG, and different features and classification models presented in Table 9.
Khalighi et al. [70] used the maximum overlap discrete wavelet transform and obtained both linear and non-linear properties and mRMR feature selection algorithm for screening the suitable features. The system reported an overall accuracy of 95% for the classification of the wake-sleep stages using the SVM classification techniques.
Hugo Simoes et al. [71] used the R-square Pearson correlation coefficient and the selected relevant features were applied into the Bayesian classifier and achieved the overall classification accuracy as 83%.
Khalighi, S.et al. [35] used three categories of subjects records from ISRUC-Sleep repository and extracted both temporal and spectral features extracted from the obtained input channel and applied the SSM4S classification method, thus achieving overall classification accuracy with consideration of the ISRUC-Sleep Subgroup-I, Subgroup-II, and Subgroup-III as 94.10%, 92.40%, and 95.39%, respectively.
Sousa et al. [72] proposed a two-step classifier based on the EEG signal and obtained the SVM classifier for distinguishing epochs suspected of misclassification and as well as both times, and frequency domain features and classified features were forwarded into the SVM classifier with classification accuracy being reported as 86.75%.
Khalighi et al. [73] designed subject-independent improved automated sleep stage classification with application wakesleep classification and classified through the SVM classifier achieving 81.74% overall classification accuracy.
K D Tzimourta et al. [74] proposed an automated sleep staging system using brain EEG signals. The obtained energy features fed into the random forest classification model. The model reported an overall accuracy of 75.29% for five sleep states classification.
Najdi et al. [75] proposed a sleep study based on the twolayer stacked sparse auto-encoder and obtained frequency, time-frequency, time-domain features which were extracted from EEG signals. The resulted classification accuracy was reported as 82.2%.
Finally, Hashem Kalbkhani [76] introduced the stockwell transform for signal decomposition and the decomposed features were processed through the SVM and KNN classifier. The average accuracy reported for SVM as 82.33% and for KNN as 81.00.

| CONCLUSION AND FUTURE WORK
The current research work presents a robust system for sleep stage classification according to sleep rule-based AASM which is a single channel of EEG signals. The proposed methodology was carried out more than the EEG epoch time of 30 s. The data used in the experiment are available on the ISRUC-Sleep database. The basic contribution of this work is the use of statistical features for analysing the sleep characteristics and its three different combination of features used to classify the two-state sleep stages. Three individual experiments with three different sets of features are applied to both sleep-disordered subjects and healthy control subjects. Three machine learning classifiers such as DT, KNN, and RF are obtained for classifying the sleep stages. Additionally, we used the relief feature selection algorithm for screening the best five and nine features, which help to systematically screen the sleep characteristics. It is observed that the five-set features yield the best performance for both categories of subjects for identifying the sleep irregularities. The reported accuracy for all individual subjects exceeded above 90%. Next to that, 12 feature combinations have given reasonable accuracy results for the two-state sleep stage classification. The performance of the proposed study was analysed in two different manners, first by comparing the results of the proposed model with ensemble classification results, and second, by comparing with eight existing similar contributions. From comparison results, it is found that our proposed model achieved the best classification accuracy. This work can guide clinicians to diagnose accurately and take appropriate decisions for the treatment of different types of sleep-related disorders.