The modal and mean value of ESS and the mean MSL for each diagnosis with five or more cases are shown in Table 1. The remaining diagnoses were very heterogeneous: three of anxiety disorder, two of insufficient sleep syndrome, and one each of REM behaviour disorder, sleep misperception, central sleep apnoea, depression, advanced sleep phase syndrome, bronchiectasis and sleep walking. In general, higher ESS scores and lower MSL were observed in patients with diagnoses associated with prominent sleepiness, although there were individuals in all diagnostic groups with outlying results. The modal ESS scores for different diagnoses were more clearly separated than the mean scores, reflecting the disproportionate influence on the mean value of patients with outlying results.
Correlations between ESS scores and the other variables for the whole sample and for patients with MSL less than 5 min are shown in Table 2, and between MSL and the same variables in Table 3. The observed correlations were modest, and in the sample as a whole no variable is to any significant extent more closely related to the ESS or MSL than several others. MSL was, as would be expected, more closely related to the sleep variables from the preceding night. The correlation of MSL and scores on the somatisation subscale of the SCL-90 was of borderline statistical significance, taking account of the number of correlations estimated. The ESS was correlated with the global severity index of the SCL-90 and with all the subscales except psychoticism.
When only objectively sleepy patients (those with MSL less than 5 min) were considered, the correlation between ESS and MSL was lost. In this group, only scores on the somatisation subscale of the SCL-90 were clearly correlated with the ESS, although this association and those with depression and the global severity index were closer than in the group as a whole.
Partners’ estimates of the patients’ sleepiness using the ESS were correlated with patients’ own estimates of their sleepiness (Spearman ??=0.72, P<0.0001). Although overall agreement was relatively close, marked disagreement was common. Figure 1 displays a Bland-Altman plot (Bland and Altman 1986) of the difference between the ESS scores of the patients’ and their partners’ estimates against the mean of the patients’ and their partners’ estimates. There was no systematic variation in the disagreement as the mean values increased, but it is clear that discrepancies tended to be larger when the mean ESS was lower. In the range of mean ESS from 6 to 12, the standard deviation of the difference between patients’ and partners’ estimates was 4.4 and the range –12 to13. In the range of mean ESS from 12 to 18 the standard deviation of the difference was 3.4 and the range was –9 to 5.
In the logistic regression model predicting ESS scores ≤10 or >10, both MSL and global severity index were significant. The odds ratio of an ESS score greater than 10 for each minute increase in the MSL was 0.88 (95% CI 0.83, 0.94, P<0.001). The odds ratio of an ESS score greater than 10 for each point increase in the global severity index was 1.04 (95% CI 1.01, 1.07, P<0.004).
The results reported here suggest that the ESS and the MSLT do not measure the same aspect(s) of sleepiness, although the rank ordering of sleepy patients by the two tests was similar. In particular, the ESS is more closely associated with scores on the SCL-90 than the MSL. If the MSLT is accepted as the gold-standard for the assessment of sleepiness, the ESS is not an adequate surrogate for clinical use, with poor sensitivity and specificity even for markedly shortened MSL.
When ESS scores were divided into ‘normal’ and ‘abnormal’ using a cut-off score of 10 both higher SCL-90 scores and objective sleepiness (MSL) were found to be independent predictors of abnormality.
The SCL-90 is described as an instrument measuring psychological symptom intensity because a high level of distress caused by one symptom has the same weight as a little distress caused by several symptoms. Although there are correlations between SCL-90 scores and psychiatric diagnoses, the SCL-90 does not measure psychiatric illness and cannot provide reliable psychiatric diagnoses. Our results should not be interpreted as showing that patients with clinical depression (for example) report higher ESS scores than others.
The SCL-90 and the ESS have a similar format, asking subjects to provide a numerical answer to questions about symptoms. It is therefore possible that correlations between ESS and SCL-90 scores could arise from a tendency of some subjects to give strongly positive responses to any series of questions. The existence and importance of this form of response bias are controversial (Streiner and Norman 1995), and neither the SCL-90 nor the ESS includes questions to adjust for answering bias. Correlations between the ESS and scales such as the MOS SF-36 (Briones et al. 1996), which have been thought important in establishing the validity of the ESS may also be affected by this form of response bias.
The patients of this study were referred because a physician suspected a sleep disorder. Referral is likely to have been commonest when patients expressed their complaint in terms suggestive of a sleep disorder. Patients expressing fatigue and weakness without overt sleepiness may, in contrast, have been given working diagnoses of chronic fatigue and referred elsewhere. Since the ESS covers many of the questions which clinicians habitually ask to assess sleepiness, patients with high ESS scores may have been more likely to be referred. Patients with more numerous and more obvious clinical indicators of a sleep disorder are also more likely to have been referred. This referral bias would tend to increase the correlation of ESS and MSL. No validation of the ESS has been undertaken with an unbiased community sample, and caution should be exercised in extrapolating even the correlations reported here to samples of the general community.
The MSLT was designated as the gold-standard for this study, but it is recognised that the MSLT is a gold-standard only in a limited sense. The MSLT is the standard for measuring physiological sleep tendency, but this is only one aspect of sleepiness (Pivik 1990) and not necessarily the one which correlates best with the morbidity related to sleep disorders. The MSL has not been found to correlate well with other methods of assessing subjective sleepiness, whether in normal subjects (Johnson et al. 1990), patients with obstructive sleep apnoea (Dement et al. 1978) or patients with insomnia (Lichstein et al. 1994). Indeed, the correlations reported here between MSL and ESS are higher than those observed between the MSL and some other subjective measures of sleepiness (Lichstein et al. 1994).
For clinical purposes, the MSLT does have some important advantages over the ESS. In particular, its relative immunity from deliberate misrepresentation is important when decisions about a patient's fitness to engage in hazardous activities have to be made. However, not only deliberate misrepresentation is at issue. It was common for patients whose partners estimated them to be abnormally sleepy to report ESS within the normal range. This result is consistent with clinical experience and casts doubt on the reliability of the ESS in excluding significant sleepiness, even when there is no intent to deceive.
The ESS may be especially inappropriate as a surrogate for the MSLT in epidemiological studies. In any situation where the target disorder is of low (<10%) prevalence specificity less than 100% seriously impairs the utility of a diagnostic test. For example, ESS over 15 had a specificity of 88% in the identification of individuals with an MSL of 5 min or less (see Fig. 2). The application of the ESS with a cut-off score of 15 would therefore give a prevalence estimate of MSL <5 min of 12% even if the true prevalence were zero.
Figure 2. . Mean sleep latency (MSL) in minutes from the Multiple sleep latency test (MSLT) is plotted against ESS scores. The vertical lines indicate approximate lower limits of normal (MSL=10 min) and definite abnormality (MSL=5 min). The horizontal line indicates lower limit of normal for the ESS (ESS=10). See text for statistical significance.
Download figure to PowerPoint
Although the ESS estimate of sleepiness is a number, there is no evidence that it is an interval scale (i.e. that the interval between 1 and 2 is the same as the interval between 9 and 10 or that between 23 and 24). Indeed, consideration of the questions asked in the ESS suggests that the intervals of the ESS are very unlikely to be equal (for example, the difference between ‘no chance’ and ‘moderate chance’ of falling asleep after a lunch without alcohol is very unlikely to be the same as the difference between ‘no chance’ and ‘moderate chance’ of falling asleep while stopped for a few minutes in traffic when driving a car). Whether the ESS is so far from being an interval scale that it is invalid to calculate a mean value for ESS scores among individuals or to use ESS scores in regression models we do not know. Because the possibility exists, however, we have avoided as far as possible statistical analyses that require the assumption that the ESS is an interval scale. The only exception is our presentation of mean ESS scores, which is intended to assist comparison of our results with those of others.
The correlation between MSL and ESS reported here (0.30) was lower than that reported by Johns (1993) (0.42) but close to that reported by Briones et al. (1996) (0.27). Johns (1991), however, observed a significant correlation between ESS and apnoea/hypopnoea index and variation of ESS scores among grades of sleep-disordered breathing (Johns 1994) which we have not found. Both in clinic (Crocker et al. 1990) and in population samples (Olson et al. 1995; Bliwise and King 1996), it has been difficult to show that complaints of sleepiness predict disturbed breathing during sleep, presumably because other causes of sleepiness are common.
Johns (1991) also observed a clear separation of ESS scores among diagnostic groups other than sleep-disordered breathing, with no narcoleptic patient having an ESS score less than 13 and no insomniac having an ESS score over 6. We observed variation of ESS scores among diagnostic groups broadly consistent with Johns’ results, but not nearly so neat a separation. One insomniac patient had an ESS of 20 and a patient with clear-cut narcolepsy had an ESS of 4 (see Table 1). Because of these outlying results, the modes of ESS scores were more clearly separated than the mean scores. Whether the outlying results are statistical outliers, which would be expected in any large unselected case series, or arise from illiteracy or incomprehension of the ESS form is unclear.
A number of differences of setting and methodology should be noted that may be important in interpreting the differences between these results and those reported by Johns (1991, 1993). The samples studied by Johns (1991, 1993) were drawn from a referral clinic located in a private hospital and the patients were exclusively fee-paying. The pathways to referral, and therefore the potential referral bias, may have been quite different to those followed by the present sample, of which only about 20% were fee-paying. Referral bias could affect the results in two ways. It could contribute to a lack of correlation between measures of sleepiness and apnoea/hypopnoea index and to a lack of differences among diagnostic groups if only or mainly sleepy patients are referred. In this circumstance a correlation present in the whole population of patients may be obliterated or obscured. A private clinic may have more patients with minimal sleepiness who are nevertheless referred for detailed evaluation of conditions such as snoring. Conversely, if referring doctors used similar questions to those of the ESS to decide which patients need referral the absence of patients with low ESS scores in some diagnostic groups might also be artefactual.
Johns administered the ESS at the end of the first clinical interview (Johns 1991), while in the present study the ESS was completed days to weeks after the clinical assessment. Johns’ clinical interviews must have included discussion of sleepiness, and may have sensitised the patients to the issue of sleepiness and improved the accuracy of their assessment.
Finally, we did not check that our subjects could read the ESS form or that they had understood the directions. We do not know the prevalence in our sample of difficulty reading English. However, the prevalence of patients who neither speak nor read English well and of native speakers of English who read poorly would certainly be higher in our community than in that served by Johns‘ clinic.
Sleepiness is a complex phenomenon which, in clinical practice, represents an amalgam of subjective complaint and objective sleep tendency. The factors that lead to a subjective complaint of severe sleepiness are poorly defined. Our results suggest that they include not only increased sleep tendency but psychological factors as well.