• sleep disorders;
  • somnolence;
  • obstructive sleep apnea;
  • narcolepsy


  1. Top of page
  2. Abstract
  4. References

The aim of this study was to identify factors other than objective sleep tendency associated with scores on the Epworth Sleepiness Scale (ESS). There were 225 subjects, of whom 40% had obstructive sleep apnoea (OSA), 16% had simple snoring, and 4.9% had snoring with sleep disruption (upper airway resistance syndrome); 9.3% had narcolepsy and 7.5% had hypersomnolence without REM sleep abnormalities; 12% had chronic fatigue syndrome; 7.5% had periodic limb movement disorder and 3% had diurnal rhythm disorders. ESS, the results of overnight polysomnography and multiple sleep latency test (MSLT) and SCL-90 as a measure of psychological symptoms were recorded. The ESS score and the mean sleep latency (MSL) were correlated (Spearman ??=−0.30, P<0.0001). The MSL was correlated with total sleep time (TST) and with sleep efficiency but not with apnoea/hypopnoea index. There was no association between the MSL and any aspect of SCL-90 scores, except a borderline significant association with the somatisation subscale. The ESS was correlated with TST but not with sleep efficiency or apnoea/hypopnoea index. The ESS was correlated with all subscales of the SCL-90 except psychoticism. An ESS≥10 had poor sensitivity and specificity as a predictor of MSL <10 min or MSL <5 min. We conclude that the MSLT and the ESS are not interchangeable. The ESS was influenced by psychological factors by which the MSL was not affected. The ESS cannot be used to demonstrate or exclude sleepiness as it is measured by MSLT.


  1. Top of page
  2. Abstract
  4. References

The Epworth Sleepiness Scale (ESS) is a self-administered questionnaire which seeks to give a numerical value to the subjective sleepiness of patients with sleep disorders (Johns 1991, 1993, 1994). It has been widely used, but its validation has been limited – particularly as regards the number of consecutive patients studied and the range of disorders from which they have suffered. In order to extend the validation of the ESS, we compared it to the multiple sleep latency test (MSLT) in a large group of patients with a range of disorders associated with sleepiness.

The correlation of ESS scores and the mean sleep latency (MSL) measured in the MSLT is only moderate (Johns 1994). This may be because the two tests do not measure the same thing or because one is inaccurate. The ESS asks patients to estimate their chance of sleeping in a range of situations, and to speculate in the case of situations which they have not experienced recently. We hypothesised that in this context depressed patients, and those with a tendency to somatise their complaints, might overestimate their sleepiness, accounting for some of the disagreement between ESS and MSL. It is also intuitively plausible to suppose that a self-administered questionnaire is more likely to be affected by emotional and psychological factors than an objective measurement of sleep. Therefore, we administered the Symptom Checklist 90 (SCL-90) as a measure of psychological symptom intensity and correlated its results with MSL and ESS score.


The Sleep Disorders Centre at Royal Newcastle Hospital is the only referral centre for sleep disorders for a population of about 500 000. The community is predominantly urban and of Anglo-Celtic origin. All patients are referred by medical practitioners. Over 80% of referrals are from general practitioners with the remainder from specialist physicians and ear nose and throat surgeons. The Sleep Disorders Centre has been established since 1978 and is well-known to the medical community, so that referrals to centres in other cities and to specialists in areas other than sleep disorders are believed to be uncommon.

The data reported here concerns 225 near-consecutive patients referred because of a suspicion of a sleep disorder causing excessive sleepiness. Patients were eligible for inclusion if the primary referral letter mentioned sleepiness as a complaint or mentioned a disorder typically associated with sleepiness (in most cases obstructive sleep apnoea or narcolepsy); there were 232 such patients during the period of the study. There were 108 ineligible patients, who were referred because of insomnia (102) or parasomnia. They were seen by one of two sleep physicians (AA, who saw approximately 70%, or LGO). Patients were excluded if initial assessment demonstrated a disorder unrelated to sleep but likely to be the cause of the primary complaint: angina pectoris and/or cardiac failure in four cases and diabetes mellitus and chromophobe pituitary adenoma in one case each. One patient initially included was subsequently excluded because of lost ESS data.

Polysomnography and multiple sleep latency testing were performed by standard methodology. The instrumentation and scoring methods used in the laboratory have been described in detail elsewhere (Gyulay et al. 1993). The definitions of sleep disorders were those of the International Classification of Sleep Disorders (Diagnostic Classification Sleering Committee 1990), except that upper airway resistance syndrome (UARS) was defined as disruptive snoring with symptomatic sleepiness, with fewer than 15 apnoea/hypopnoea but more than 15 arousals per hour of sleep. Chronic fatigue and chronic fatigue syndrome were diagnosed by the criteria of Fukuda et al. (1990) and fibromyalgia by the criteria of the American Rheumatism Association (Wolfe et al. 1990).

Patients arrived at the sleep laboratory following an evening meal without alcohol or caffeine. The ESS form was a replica of that used by Johns (1991). The patients completed the SCL-90 and the ESS before set-up for polysomnography. The patient's bed partner independently completed an ESS estimate of the patient's sleepiness if they accompanied the patient to the sleep laboratory or could complete and return the ESS within 24 h (n=126).

The SCL-90 was administered as a two page printed form. The SCL-90 asks subjects to estimate the extent to which each of a list of symptoms has distressed them over the preceding seven days, rating each on a scale from ‘not at all’ (=0) to ‘extremely’ (=4). SCL-90 scores are the sum of these individual item scores divided by the number of items for each subscale or by 90 for the global severity index. The SCL-90 scores were standardised for age and gender using the norms for non-psychiatric outpatients (SCL-90-R: Administration, Scoring and Procedures Manual, Clinical Psychometric Research Inc, Townson, MD, USA).

The results were analysed using the Stata package (Release 5.0, Stata Corporation, College Station, TX, USA). ESS scores are reported both as mode and range and as mean and range, and MSL as mean and range. Spearman rank correlations were calculated for ESS scores, MSL, age, sleep variables from overnight polysomnography and SCL-90 scores. A logistic regression model predicting ESS scores less than or equal to or greater than 10 (i.e. normal or abnormal) was constructed with MSL and the global severity index of the SCL-90 selected arbitrarily as predictive variables.


The modal and mean value of ESS and the mean MSL for each diagnosis with five or more cases are shown in Table 1. The remaining diagnoses were very heterogeneous: three of anxiety disorder, two of insufficient sleep syndrome, and one each of REM behaviour disorder, sleep misperception, central sleep apnoea, depression, advanced sleep phase syndrome, bronchiectasis and sleep walking. In general, higher ESS scores and lower MSL were observed in patients with diagnoses associated with prominent sleepiness, although there were individuals in all diagnostic groups with outlying results. The modal ESS scores for different diagnoses were more clearly separated than the mean scores, reflecting the disproportionate influence on the mean value of patients with outlying results.

Table 1.  Modes and ranges and means of Epworth Sleepiness Scale (ESS) scores and means and ranges of mean sleep latency (MSL) for all diagnoses made in five or more cases Thumbnail image of

Correlations between ESS scores and the other variables for the whole sample and for patients with MSL less than 5 min are shown in Table 2, and between MSL and the same variables in Table 3. The observed correlations were modest, and in the sample as a whole no variable is to any significant extent more closely related to the ESS or MSL than several others. MSL was, as would be expected, more closely related to the sleep variables from the preceding night. The correlation of MSL and scores on the somatisation subscale of the SCL-90 was of borderline statistical significance, taking account of the number of correlations estimated. The ESS was correlated with the global severity index of the SCL-90 and with all the subscales except psychoticism.

Table 2.  Correlation between Epworth Sleepiness Scale (ESS) scores and other parameters for all patients and separately for those with MSL less than 5 min Thumbnail image of
Table 3.  Correlation between mean sleep latency (MSL) measured during multiple sleep latency tests (MSLT) and other parameters Thumbnail image of

When only objectively sleepy patients (those with MSL less than 5 min) were considered, the correlation between ESS and MSL was lost. In this group, only scores on the somatisation subscale of the SCL-90 were clearly correlated with the ESS, although this association and those with depression and the global severity index were closer than in the group as a whole.

Neither ESS nor MSL was correlated with apnoea/hypopnoea index, whether all subjects or only those with obstructive sleep apnea were included. Neither ESS nor MSL was correlated with the percentage of sleep time spent with arterial oxygen saturation less than 90%.

Partners’ estimates of the patients’ sleepiness using the ESS were correlated with patients’ own estimates of their sleepiness (Spearman ??=0.72, P<0.0001). Although overall agreement was relatively close, marked disagreement was common. Figure 1 displays a Bland-Altman plot (Bland and Altman 1986) of the difference between the ESS scores of the patients’ and their partners’ estimates against the mean of the patients’ and their partners’ estimates. There was no systematic variation in the disagreement as the mean values increased, but it is clear that discrepancies tended to be larger when the mean ESS was lower. In the range of mean ESS from 6 to 12, the standard deviation of the difference between patients’ and partners’ estimates was 4.4 and the range –12 to13. In the range of mean ESS from 12 to 18 the standard deviation of the difference was 3.4 and the range was –9 to 5.

The upper limit of normal for the ESS has been set at 10 out of a maximum score of 24 (Johns 1991). There were 13 patients whose ESS scores were 10 or less but whose partners estimated their ESS score at over 10.

Figure 2 shows the diagnostic performance of ESS >10 for MSL of 10 min (approximately the lower limit of normal) and 5 min (a definitely abnormal value). The sensitivity of an ESS>10 for an MSL of 10 min or less was 48% and the specificity was 67%. The sensitivity of an ESS>10 for an MSL of less than 5 min was 68% and the specificity was 34%.

In the logistic regression model predicting ESS scores ≤10 or >10, both MSL and global severity index were significant. The odds ratio of an ESS score greater than 10 for each minute increase in the MSL was 0.88 (95% CI 0.83, 0.94, P<0.001). The odds ratio of an ESS score greater than 10 for each point increase in the global severity index was 1.04 (95% CI 1.01, 1.07, P<0.004).


The results reported here suggest that the ESS and the MSLT do not measure the same aspect(s) of sleepiness, although the rank ordering of sleepy patients by the two tests was similar. In particular, the ESS is more closely associated with scores on the SCL-90 than the MSL. If the MSLT is accepted as the gold-standard for the assessment of sleepiness, the ESS is not an adequate surrogate for clinical use, with poor sensitivity and specificity even for markedly shortened MSL.

When ESS scores were divided into ‘normal’ and ‘abnormal’ using a cut-off score of 10 both higher SCL-90 scores and objective sleepiness (MSL) were found to be independent predictors of abnormality.

The SCL-90 is described as an instrument measuring psychological symptom intensity because a high level of distress caused by one symptom has the same weight as a little distress caused by several symptoms. Although there are correlations between SCL-90 scores and psychiatric diagnoses, the SCL-90 does not measure psychiatric illness and cannot provide reliable psychiatric diagnoses. Our results should not be interpreted as showing that patients with clinical depression (for example) report higher ESS scores than others.

The SCL-90 and the ESS have a similar format, asking subjects to provide a numerical answer to questions about symptoms. It is therefore possible that correlations between ESS and SCL-90 scores could arise from a tendency of some subjects to give strongly positive responses to any series of questions. The existence and importance of this form of response bias are controversial (Streiner and Norman 1995), and neither the SCL-90 nor the ESS includes questions to adjust for answering bias. Correlations between the ESS and scales such as the MOS SF-36 (Briones et al. 1996), which have been thought important in establishing the validity of the ESS may also be affected by this form of response bias.

The patients of this study were referred because a physician suspected a sleep disorder. Referral is likely to have been commonest when patients expressed their complaint in terms suggestive of a sleep disorder. Patients expressing fatigue and weakness without overt sleepiness may, in contrast, have been given working diagnoses of chronic fatigue and referred elsewhere. Since the ESS covers many of the questions which clinicians habitually ask to assess sleepiness, patients with high ESS scores may have been more likely to be referred. Patients with more numerous and more obvious clinical indicators of a sleep disorder are also more likely to have been referred. This referral bias would tend to increase the correlation of ESS and MSL. No validation of the ESS has been undertaken with an unbiased community sample, and caution should be exercised in extrapolating even the correlations reported here to samples of the general community.

The MSLT was designated as the gold-standard for this study, but it is recognised that the MSLT is a gold-standard only in a limited sense. The MSLT is the standard for measuring physiological sleep tendency, but this is only one aspect of sleepiness (Pivik 1990) and not necessarily the one which correlates best with the morbidity related to sleep disorders. The MSL has not been found to correlate well with other methods of assessing subjective sleepiness, whether in normal subjects (Johnson et al. 1990), patients with obstructive sleep apnoea (Dement et al. 1978) or patients with insomnia (Lichstein et al. 1994). Indeed, the correlations reported here between MSL and ESS are higher than those observed between the MSL and some other subjective measures of sleepiness (Lichstein et al. 1994).

For clinical purposes, the MSLT does have some important advantages over the ESS. In particular, its relative immunity from deliberate misrepresentation is important when decisions about a patient's fitness to engage in hazardous activities have to be made. However, not only deliberate misrepresentation is at issue. It was common for patients whose partners estimated them to be abnormally sleepy to report ESS within the normal range. This result is consistent with clinical experience and casts doubt on the reliability of the ESS in excluding significant sleepiness, even when there is no intent to deceive.

The ESS may be especially inappropriate as a surrogate for the MSLT in epidemiological studies. In any situation where the target disorder is of low (<10%) prevalence specificity less than 100% seriously impairs the utility of a diagnostic test. For example, ESS over 15 had a specificity of 88% in the identification of individuals with an MSL of 5 min or less (see Fig. 2). The application of the ESS with a cut-off score of 15 would therefore give a prevalence estimate of MSL <5 min of 12% even if the true prevalence were zero.


Figure 2. . Mean sleep latency (MSL) in minutes from the Multiple sleep latency test (MSLT) is plotted against ESS scores. The vertical lines indicate approximate lower limits of normal (MSL=10 min) and definite abnormality (MSL=5 min). The horizontal line indicates lower limit of normal for the ESS (ESS=10). See text for statistical significance.

Download figure to PowerPoint

Although the ESS estimate of sleepiness is a number, there is no evidence that it is an interval scale (i.e. that the interval between 1 and 2 is the same as the interval between 9 and 10 or that between 23 and 24). Indeed, consideration of the questions asked in the ESS suggests that the intervals of the ESS are very unlikely to be equal (for example, the difference between ‘no chance’ and ‘moderate chance’ of falling asleep after a lunch without alcohol is very unlikely to be the same as the difference between ‘no chance’ and ‘moderate chance’ of falling asleep while stopped for a few minutes in traffic when driving a car). Whether the ESS is so far from being an interval scale that it is invalid to calculate a mean value for ESS scores among individuals or to use ESS scores in regression models we do not know. Because the possibility exists, however, we have avoided as far as possible statistical analyses that require the assumption that the ESS is an interval scale. The only exception is our presentation of mean ESS scores, which is intended to assist comparison of our results with those of others.

The correlation between MSL and ESS reported here (0.30) was lower than that reported by Johns (1993) (0.42) but close to that reported by Briones et al. (1996) (0.27). Johns (1991), however, observed a significant correlation between ESS and apnoea/hypopnoea index and variation of ESS scores among grades of sleep-disordered breathing (Johns 1994) which we have not found. Both in clinic (Crocker et al. 1990) and in population samples (Olson et al. 1995; Bliwise and King 1996), it has been difficult to show that complaints of sleepiness predict disturbed breathing during sleep, presumably because other causes of sleepiness are common.

Johns (1991) also observed a clear separation of ESS scores among diagnostic groups other than sleep-disordered breathing, with no narcoleptic patient having an ESS score less than 13 and no insomniac having an ESS score over 6. We observed variation of ESS scores among diagnostic groups broadly consistent with Johns’ results, but not nearly so neat a separation. One insomniac patient had an ESS of 20 and a patient with clear-cut narcolepsy had an ESS of 4 (see Table 1). Because of these outlying results, the modes of ESS scores were more clearly separated than the mean scores. Whether the outlying results are statistical outliers, which would be expected in any large unselected case series, or arise from illiteracy or incomprehension of the ESS form is unclear.

A number of differences of setting and methodology should be noted that may be important in interpreting the differences between these results and those reported by Johns (1991, 1993). The samples studied by Johns (1991, 1993) were drawn from a referral clinic located in a private hospital and the patients were exclusively fee-paying. The pathways to referral, and therefore the potential referral bias, may have been quite different to those followed by the present sample, of which only about 20% were fee-paying. Referral bias could affect the results in two ways. It could contribute to a lack of correlation between measures of sleepiness and apnoea/hypopnoea index and to a lack of differences among diagnostic groups if only or mainly sleepy patients are referred. In this circumstance a correlation present in the whole population of patients may be obliterated or obscured. A private clinic may have more patients with minimal sleepiness who are nevertheless referred for detailed evaluation of conditions such as snoring. Conversely, if referring doctors used similar questions to those of the ESS to decide which patients need referral the absence of patients with low ESS scores in some diagnostic groups might also be artefactual.

Johns administered the ESS at the end of the first clinical interview (Johns 1991), while in the present study the ESS was completed days to weeks after the clinical assessment. Johns’ clinical interviews must have included discussion of sleepiness, and may have sensitised the patients to the issue of sleepiness and improved the accuracy of their assessment.

Finally, we did not check that our subjects could read the ESS form or that they had understood the directions. We do not know the prevalence in our sample of difficulty reading English. However, the prevalence of patients who neither speak nor read English well and of native speakers of English who read poorly would certainly be higher in our community than in that served by Johns‘ clinic.

Sleepiness is a complex phenomenon which, in clinical practice, represents an amalgam of subjective complaint and objective sleep tendency. The factors that lead to a subjective complaint of severe sleepiness are poorly defined. Our results suggest that they include not only increased sleep tendency but psychological factors as well.


  1. Top of page
  2. Abstract
  4. References
  • Bland, J. M.Altman, D. G.Statistical methods for assessing agreement between two methods of clinical measurement.Lancet1986i307310
  • Bliwise, D. L.King, A. C.Sleepiness in clinical and nonclinical populations.Neuroepidemiology199615161165
  • Briones, B.Adams, A.Strauss, M.Relationship between sleepiness and general health status.Sleep199619583588
  • Crocker, B. D.Olson, L. G.Saunders, N. A.Hensley, M. J.McKeon, J. L.Murree-Allen, K.Gyulay, S. G.Estimation of the probability of disturbed breathing during sleep before a sleep study.Am. Rev. Respir. Dis.19901421418
  • Dement, W. C.Carskadon, M. A.Richardson, G.Excessive daytime sleepiness in the sleep apnea syndrome.In: Guilleminault C, Dement WC (Eds) Sleep Apnea Syndromes. Alan R Liss, New York, 1978: 23–46.
  • Diagnostic Classification Steering Committee. Chairman: M.J. Thorpy International Classification of Sleep Disorders: Diagnostic and Coding Manual. Rochester, Minnesota: American Sleep Disorders Association 1990.
  • Fukuda, K.Strauss, S. E.Hickie, I.The chronic fatigue syndrome: A comprehensive approach to its definition and study.Ann. Intern., Med.1990121953959
  • Gyulay, S. G.Olson, L. G.Hensley, M. J.King, M. T.Murree-Allen, K.Saunders, N. A.Comparison of clinical assessment and home oximetry in the diagnosis of obstructive sleep apnoea.Am. Rev. Respir Dis.19931475053
  • Johns, M. W.A new method for measuring daytime sleepiness: The Epworth Sleepiness Scale.Sleep199114540545
  • Johns, M. W.Daytime sleepiness, snoring, and obstructive sleep apnea; The Epworth Sleepiness Scale.Chest19931033036
  • Johns, M. W.Sleepiness in different situations measured by the Epworth Sleepiness Scale.Sleep199417703710
  • Johnson, L. C.Freeman, C. R.Spinweber, C. L.Gomez, S. A.The relationship between subjective and objective measures of sleepiness.Psychophysiology1990286571
  • Lichstein, K. L.Wilson, N. M.Noe, S. L.Aguillard, R. N.Bellur, S. N.Daytime sleepiness in insomnia: Behavioural, biological and subjective indices.Sleep199417693702
  • Olson, L. G.King, M. T.Hensley, M. J.Saunders, N. A.A community study of sleep-disordered breathing: clinical syndromes.Am. J. Respir Crit. Care Med.1995152707710
  • Pivik, R. T.The several qualities of sleepiness: Psychophysiological considerations.In: T.H. Monk (Ed.) Sleep, Sleepiness and Performance. Wiley, Chichester, 1990: 3–38.
  • Streiner, D. L.Norman, G. R.Health Measurement Scales. Oxford University Press, Oxford, 1995.
  • Wolfe, F.Smythe, H. A.Yunus, M. B.The American College of Rheumatology criteria for the classification of fibromyalgia: Report of the multicenter criteria committee.Arthritis Rheumatism199033160172