Prof.John Trinder, Department of Psychology, University of Melbourne, Parkville, Victoria 30052, Australia. Tel.: +61 3 9344 4549; fax: +61 3 9347 6618; e-mail: firstname.lastname@example.org
The sensitivity and specificity of four self-report measures of disordered sleep – the Sleep Impairment Index (SII), the Sleep Disorders Questionnaire (SDQ), the Dysfunctional Beliefs and Attitudes About Sleep Scale (DBAS) and the Sleep–Wake Activity Inventory (SWAI) – were compared in subjects with insomnia and normal sleep. Nineteen young adult subjects met DSM-IV criteria for primary insomnia and another 19 were normal control subjects. Discriminatory characteristics of each measure were assessed using receiver operator characteristic curve analyses. Discriminatory power was maximised for each measure to produce cut-scores applicable for identification of individuals with insomnia. The DBAS, SII and SDQ psychiatric DIMS subscale were found to correlate, and discriminated well between the two groups. The SWAI nocturnal sleep subscale was not found to be an accurate discriminator. The results suggest differences in the measures in their ability to detect insomnia, and offer guidelines as to the optimal use of test scores to identify young adults suspected of insomnia.
Insomnia is the subjective complaint of insufficient or inadequate sleep. It occurs with few physical signs, and is defined largely on the basis of the patient’s self report (Aldrich 1993). A wide range of prevalence has been reported for complaints of insomnia in population studies. Estimated population prevalence rates vary from around 2%, to over 40% (Karacan et al. 1976; Liljenberg et al. 1988, 1989; Reite et al. 1995; Rosekind 1992). Much of this variability is probably because of the differences in the questions used to elicit insomnia, and to differences in the definition of insomnia, both with respect to symptom type and time period. Lack of standardization is a significant problem and both reflects and maintains conceptual confusion regarding the nature of insomnia. However, self-report remains the easiest, cheapest and most widely used method of collecting data about an individuals’ health and risk factor status.
A number of relatively brief self-report measures have been developed to detect and quantify sleep impairment, including insomnia, in various populations. These include the Sleep Impairment Index (SII) (Morin 1993), the Sleep Disorders Questionnaire (SDQ) (Douglass et al. 1994), the Dysfunctional Beliefs and Attitudes About Sleep Scale (DBAS) (Morin 1993), and the Sleep–Wake Activity Inventory (SWAI) (Rosenthal et al. 1993). These self-report measures are subjective and retrospective, and are therefore in contrast with objectively determined measures of sleep disturbance such as polysomnograhpy (PSG) or actigraphy. However, they can provide important information regarding psychological and behavioural aspects of the sleep complaint. This type of information is particularly useful in the case of insomnia, as broadly cognitive-behavioural treatments can be particularly effective (Morin et al. 1999).
Little information regarding the properties of these measures in samples other than those used in the first instance has been published (Douglass et al. 1994; Rosenthal et al. 1993), limiting the generalizability of findings. Previous studies have examined a single test administered to a specific subject population. The one exception (Blais et al. 1997) evaluated convergent validity of French language versions of the Pittsburgh sleep quality index, the DBAS and the SII. Thus, comparisons of the sensitivity and specificity of these self-report measures can only be made by examining results reported across studies. This approach has the potential to limit conclusions regarding the relative measurement properties of different tests. Another difficulty is that studies reporting the sensitivity and specificity of screening tests often used a ‘convenience sample’ in which patients were retrospectively obtained from hospital records, rather than by using prospective diagnostic criteria.
Insomnia has been variously regarded as a disease, a complaint, a symptom and a finding (Kryger et al. 1999). While the international classification of sleep disorders revised (ICSD-R) may be regarded as the ‘gold standard’ for classification of sleep disorders, there is no widely accepted ‘gold standard’ for the diagnosis of insomnia. Insomnia is primarily defined by the nature of the sufferers’ complaint and this makes assessing the criterion-related validity of self-report based detection methods problematic. Evaluation of the psychometric properties of self-report measures may provide one solution to this difficulty. For example, measures purporting to measure insomnia should be correlated if they are measuring the same construct.
The present study compared the performance of four self-report measures of sleep in a sample of subjects with well defined complaint of insomnia, and a sample of normal sleepers. It was expected that the measures designed specifically to detect or describe insomnia would have convergent validity, and adequately discriminate between the two groups.
Subjects were administered four self-report measures of sleep disturbance – the SII, SDQ, DBAS and SWAI. The subject’s ability to attain objectively defined sleep was assessed using PSG in a multiple sleep latency protocol. The ability of each self-report measure to discriminate between the two groups, an index of criterion-related validity, was assessed with receiver operator characteristics (ROC) analysis. The relationship between the measures, their convergent validity, was assessed with correlation analyses.
Subjects were recruited from a university student population. Potential subjects responded to advertisements for either good sleepers or for people having trouble getting and/or staying asleep. Potential subjects underwent a screening evaluation by a registered psychologist. The evaluation consisted of a sleep history interview and brief psychological assessment.
The first group consisted of subjects with a primary complaint of insomnia. Criteria for subjects included in the insomnia group were that they satisfied DSM-IV (307.42) criteria for primary insomnia, and at least minimal criteria for psychophysiological insomnia under ICSD-R (307.42–0). The inclusion criteria were (1) sleep-onset latency or wake after sleep onset >30 min at least three nights each week; (2) insomnia duration of at least 1 month; (3) a complaint of significant daytime consequences of insomnia.
The second group comprised normal sleepers matched with respect to age, gender and education. Subjects in the control condition did not meet criteria for insomnia under DSM-IV (307.42), nor had indications of sleep disorders under ICSD-R criteria. Criteria for subjects in the control group were (1) usual sleep onset latency <25 min; (2) they napped infrequently and (3) denied any sleep disturbance.
Potential subjects were excluded if medical or other factors known to affect sleep were significant, and all were medication free. Subjects were also screened for psychopathology known to affect sleep using the Beck anxiety inventory (BAI) (Beck et al. 1988) and the Beck depression inventory (BDI) (Beck et al. 1961). Beck et al. (1961) suggests that scores in the range 0–9 for the BDI are considered normal or asymptomatic, scores of 10–18 indicate mild-moderate depression, scores of 19–29 indicate moderate-severe depression, and scores above 30 indicate extremely severe depression. However, as there is known to be a strong relationship between insomnia and depression (Ware and Morin 1997) the exclusion cut-off was raised to 18, eliminating severely depressed individuals but accepting those mildly depressed.
Thirty-eight subjects (33 females and five males) participated in the experimental nights to provide two groups of 19 subjects. Mean raw scores for the BAI and BDI in the insomnia group were 10.3 (±7) and 9.2 (±6), respectively, while the values for the control group were 3.9 (±5) and 2.1 (±2), respectively. No subjects reported suicidal ideation (item 9) with a rating of 3 or 4. All the subjects reported being non-smokers (currently and in the past), and free of respiratory problems (including asthma). The difference in age between subjects in the insomnia group (M=19.6 ± 1.5) and the control group (M=23.3 ± 4.8) was not significant (F1,37=0.004, P > 0.05). The University of Melbourne Human Research Ethics committee approved the study. Subjects were compensated for their participation.
Sleep diary reports
Habitual sleep was assessed for 2 weeks prior to assessment using the Pittsburgh sleep diary (Monk et al. 1994). Analysis consisted of one-way analysis of variance (ANOVA) of the factor of group (control vs. insomnia). Data were corrected for skew by square-root transform. The difference in self-reported ‘typical’ SOL between the insomnia (X=73.9 ± 28.7 min) and the control group (X=15.0 ± 7.1 min) was significant (F1,37=71.1, P < 0.001) as was the difference between self-reported total duration of wake after sleep onset (WASO) between the insomnia (X=36.7 ± 42.0 min) and the control group (X=1.4 ± 3.0 min) (F1,37=17.7, P < 0.001). Subjects in the insomnia group reported the duration since the initial onset of their insomnia as 7.3 ± 6.4 years. The minimum duration criterion was 1 month, in accordance with DSM-IV.
The four self-report measures of sleep disturbance (SII, SDQ, DBAS and SWAI) were selected because they are beginning to receive wide usage and assess a range of factors associated with insomnia, including specific insomnia symptomatology, daytime consequences and psychological sequelae. The DBAS in particular is being increasingly used to assess cognitive aspects of insomnia (Espie et al. 2000). In addition, two of these measures include subscales that are explicitly constructed to assess sleep disorders other than insomnia.
The SII (Morin et al. 1994) is a self-report instrument that elicits the subject’s perception of the level of severity, distress and impairment of daytime functioning associated with his or her insomnia. The SII comprises five items assessing severity of sleep-onset, sleep maintenance and early morning awakening problems, satisfaction with current sleep patterns; interference with daily functioning, how noticeable impairment because of sleep problem appears, and level of distress caused by the sleep problem.
The SWAI (Rosenthal et al. 1993) is a 59 item self-report measure of sleepiness. It comprises six factors: excessive daytime sleepiness, psychic distress, social desirability, energy level, ability to relax and nocturnal sleep. This inventory was designed specifically to identify excessive daytime sleepiness and has been validated against MSLT (Rosenthal et al. 1993). Items in a number of the factor scales also have relevance to insomnia, particularly those related to daytime energy levels, nocturnal sleep and social desirability. It might be expected that those with insomnia would report poorer nocturnal sleep and possibly increased daytime energy levels.
The SDQ (Douglass et al. 1994) is a 176-item questionnaire designed to assess the presence of common sleep disorders. It comprises four main factors: sleep apnoea, narcolepsy, psychiatric sleep disturbance and periodic limb movement disorders. The authors state that the questionnaire is designed for diagnosis rather than description of sleep disorders (an unpublished manual is available from the authors).
The DBAS (Morin et al. 1994) is designed to assess sleep-related cognitions. It comprises five item types: misconceptions about the cause of insomnia, misattributions about the consequences of insomnia; unrealistic expectations about sleep; control and predicability of sleep; and mistaken beliefs about sleep promoting behaviours.
Subjects completed the measures as part of a fixed inventory prior to the laboratory nights. Table 1 presents the structure of each scale and the response type required of the participants. Full descriptions of each measure are provided in the original papers.
Table 1. Scale and subscale labels with item number and response type
Assessment of test properties
Receiver–operator characteristic (ROC) curves were used to assess the overall sensitivity and specificity of each scale as a measure of insomnia. Receiver–operator characteristic curves provide a graphical representation of the relationship between sensitivity (likelihood of detecting insomnia when it is present) and specificity (likelihood of rejecting insomnia when it is not present) over all possible response values (Beck and Schultz 1986). The area under the ROC curve (AUC) indicates the performance characteristics of a test and is the most commonly used index of diagnostic performance in the ROC literature (Hanley and McNeil 1982; Mossman and Somoza 1991; Vida et al. 1994). The ROC curve for a non-discriminating measure would be a straight diagonal line from the bottom left hand corner to the upper right hand corner of the graph, and the AUC would be 0.5. The higher the curve above the chance line the more predictive power the test has (Katz and Foxman 1993). A perfectly discriminating test would have an AUC of 1, and the curve for the test would be a horizontal line from the top left hand corner to the upper right hand corner of the graph (Deyo et al. 1991; Katz and Foxman 1993; Mossman and Somoza 1991; Van der Schouw et al. 1992). Area under the ROC curve values were interpreted using guidelines specified by Swets (1988) which define a test with ‘low accuracy’ as a test with AUC values between 0.5 and 0.7, ‘moderate accuracy’ tests as with AUC values between 0.7 and 0.9, and ‘high accuracy’ as tests with AUC values >0.9. Another definition of accuracy is the proportion of all test results (positive and negative) that are correct.
Objective measurement of sleep quality
All subjects underwent PSG monitoring for two non-consecutive nights subsequent to completing the self-report measures. Subjects were asked to refrain from consuming substances that interfere with sleep (e.g. caffeine and alcohol) on the day preceding overnight data collection.
A multiple sleep onset procedure, starting at the subject’s usual bedtime, was used. Sleep onset latency was of primary interest and subjects were woken after 15 min of stage 2 sleep or greater. After completion of a sleep quality inventory they were allowed to return to sleep. This procedure was repeated six times on each night of recording. This methodology has been fully described elsewhere (Smith and Trinder 2000). The recordings for the identification of sleep state consisted of a standard electroencephalogram (EEG) montage, with central (C3-A2) and occipital (O1-A2) EEG sites. Gold cup surface electrodes were positioned according to the international 10–20 system. EOG (bipolar, with electrodes displaced at the outer canthus of each eye), and submental electromyogram (EMG) were also used. All sleep variables were recorded using a 16-channel polygraph (Grass polygraph model 7D, Grass Instruments Co., USA) onto paper charts with a paper speed of 10 mm s–1. Occipital and central EEG were also recorded onto an IBM-compatible personal computer via a 12-bit analogue to digital converter sampling at 100 Hz for display and hard disk storage. In addition to sleep state variables, respiration was assessed with an oral/nasal thermistor and leg movements assessed with tibial EMG.
Scoring of sleep onset was performed by visual analysis according to standard criteria (Rechtschaffen and Kales 1968) in 30-s epochs. Sleep spindles and K complexes were identified in the central EEG (indicative of stage 2 sleep onset). These distinctions were made using standard methods by an experienced scorer blind to the experimental protocols.
Objective sleep quality
A comparison was made between groups for objective SOL values as defined by the first epoch of stage 2 sleep. The first sleep onset was regarded as a measure of traditional sleep onset. The remaining sleep onsets were considered to be a generalized measure of the ability to get to sleep. The SOL data were averaged over the two nights for each subject. Mean sleep onset time for the first sleep onset was 14.36 ± 11.22 for the control group and 30.92 ± 21.39 min for the insomnia group. Mean sleep onset time for the remaining sleep onsets was 6.69 ± 8.25 for the control group and 14.20 ± 17.29 min for the insomnia group. Analysis consisted of a one-way analysis of variance of the factor of group (control vs. insomnia). The main effects of group was significant for the first sleep onset (F1,37=–3.49, P < 0.01) and for the remaining sleep onsets (F1,37=–3.24, P < 0.01). That is, subjects in the insomnia group took longer to reach stage 2 sleep. These results indicate that the two groups were clearly distinguished on objective measures of sleep parameters. Further, the possibility of other disorders of sleep such as sleep apnoea and periodic limb movement syndrome (PLMS) were eliminated in both groups on the basis of the polysomnography over the two nights. Some subjects in the insomnia group fell asleep faster in the laboratory than they reported doing at home. That is, a ‘reverse first night’ effect was noted in the laboratory relative to ‘typical’ sleep at home. There are theoretical explanations predicting this in psychophysiological insomniacs (Hauri and Olmstead 1989), and no subject was excluded on this basis.
It was expected that scales or subscales thought to be assessing insomnia would correlate with each other, and that those assessing other sleep problems would not correlate with those reflecting insomnia. Specifically, it was hypothesised that the SDQ psychiatric DIMS subscale, the SWAI nocturnal sleep subscale, the DBAS and the SII would correlate highly.
After checking scatterplots for outliers using SPSS-PC scatter (Tabachnik and Fidell 1989), all data points were retained for correlation analyses. Pearson’s product-moment correlation coefficients were then calculated across all scales and subscales. Table 2 shows full correlation matrix results for scores across self-report measures. Correlations between the four scales thought to measure insomnia have been shown in bold in Table 2. Three of the scales, the SDQ psychiatric DIMS factor, the DBAS and the SII showed substantial relationships, while the SWAI nocturnal sleep factor was unrelated, or only moderately related, to the other three.
Table 2. Pearson correlation coefficients for average scores across self-report measures for the total sample (control and insomnia groups)
Relationships between subscales predicted to assess insomnia and those predicted not to do so are underlined in Table 2. A number were significant, however, the average correlation was only 0.38 with 10 out of 32 correlations being significant. This compares to an average correlation of 0.58 and 3 of 6 comparisons being significant for the insomnia scales. The difference between the two sets of correlations was greater when the SWAI nocturnal sleep factor was disregarded. Between the insomnia scales and the others the average correlation was 0.43 with 6 of 24 significant, while between the three insomnia scales the correlation was 0.76 with 3 of 3 comparisons significant. Thus, there was strong evidence of convergent validity for three of the four scales predicted to measure insomnia.
The remaining correlations were between scales not designed to measure insomnia. As would be anticipated, while there were a number of significant correlations, the relationships were generally low to moderate.
Table 3. Area under the receiver operator characteristic (ROC) curve (AUC) range and accuracy categories for each measure
Self-report factors that discriminated between the two groups at a level significantly better than chance for the SDQ were Psychiatric DIMS factor, Apnoea factor, Narcolepsy factor and the PLMS factor. In the SWAI, discriminating factors were Psychic Distress and Social Desirability. Both the DBAS and the SII had high accuracy in discrimination. The SWAI Nocturnal Sleep factor did not discriminate between the groups. However, three of the four predicted insomnia scales had high accuracy, while one other scale, the SDQ PLMS factor, also had high accuracy.
While the ROC curves provides a useful representation of test performance, in clinical practice cut-scores are more likely to be useful in discriminating between clients with insomnia and healthy adults. Sensitivity and specificity for each of the scales were simultaneously maximized using the ROC technique to provide a ‘cut-point’ score for each scale indicating best differentiation between the two groups. Sensitivity is the extent to which the scale detects subjects with insomnia; specificity is the extent to which the scale identifies control subjects as not having insomnia. Table 4 shows performance statistics (sensitivity and specificity) for each measure at the cut-points.
Table 4. Sensitivity and specificity values at maximized discriminatory cut-off points for each measure
The results in Table 4 indicate, for example, that an SII cut-score of 14 provided optimal discrimination between the two groups, yielding a sensitivity of 94% (likelihood of detecting insomnia in a subject from the insomnia group) and specificity of 94% (likelihood of rejecting insomnia in a subject from the control group).
The cut-off scores generated for the predicted insomnia measures maximized sensitivity and specificity. The sensitivity and specificity for the SDQ psychiatric DIMs factor, the DBAS and the SII were generally high, although there was some compromise in the specificity of the DBAS. Again, the SWAI nocturnal sleep factor returned poor sensitivity and specificity for the given cut-off value. The highest concurrent sensitivity and specificity in discrimination between the two groups was provided by the PLMS factor of the SDQ, a scale not predicted to be a measure of insomnia.
This study compared the differential sensitivity and specificity of four self-report measures in a cohort of insomnia and control subjects. The subjects in the insomnia group were initially self-selected poor sleepers who were subsequently shown to satisfy DSM-IV (307.42) criteria for primary insomnia and at least minimal criteria for psychophysiological insomnia under ICSD-R (307.42-0). Both these diagnostic schemes include severity criteria, such that the subjects reported significant impairment of their day to day functioning. The sleep problems reported by subjects in the insomnia group were not transient (mean duration was >7 years). Other common disorders of sleep such as sleep apnoea were ruled out. The sleep problems reported by the insomnia group were not caused by increased alcohol use, napping or irregular schedules as indicated by sleep diary and self-report. In addition, subjects in the insomnia group were free of any medication or other substance use known to affect sleep quality. They reported more time in WASO and decreased total sleep time than the control group. Increased SOL was confirmed by overnight PSG. Thus there was strong evidence to believe that they were genuine, albeit young, insomniacs.
The correlation analyses provide an estimate of convergent validity, that is, the extent to which the different measures of the construct (insomnia) are related to each other. Valid measures of insomnia should correlate highly with one another. This was found for the DBAS, SII and the SDQ psychiatric DIMS subscale, but not with the nocturnal sleep subscale of the SWAI. Lesser, although moderate, relationships were found between these scales and those designed to assess other aspects of poor sleep. This suggests the insomnia scales may have poor divergent validity. In general, these three scales appeared to measure similar aspects of poor sleep in the insomnia group.
Three of the four self-report measures of insomnia distinguished effectively between the insomnia and control groups, suggesting good specificity. Thus, the SDQ psychiatric DIMS factor, the DBAS and the SII were found to be highly accurate discriminators. In contrast, the SWAI nocturnal sleep subscale was not found to be an accurate discriminator.
Some factors not designed to detect insomnia, such as the SDQ PLMS factor, also discriminated between the two groups. This occurred despite no evidence of PLMS on PSG in the insomnia group. This lack of discriminate validity could reduce the utility of these measures if used to differentiate sleep disorders in populations with heterogeneous sleep disorders. That is, while some of these measures discriminate well between good sleepers and those with insomnia, it is not known how well they might discriminate between complaints of insomnia and PLMS or sleep apnoea. It is possible that these scales identify a general ‘sleep dissatisfaction’ factor and further investigation of these properties is required.
It is important to note that the relative usefulness of any given sensitivity and specificity values depend on the population prevalence of insomnia. Higher sensitivity may be required in primary diagnostic settings. It also likely that an increased magnitude of self-reported complaint would be found in older subjects with insomnia selected from a clinical population. However, confounds of medication use and co-morbid health problems would also likely increase.
These data suggest that the DBAS, SII and the SDQ (psychiatric DIMS and PLMS subscales) offer similar sensitivity in detecting insomnia. However, the SDQ does not appear to distinguish between insomnia and PLMS. In addition, it requires completion of 176 items, only nine of which are included in the psychiatric DIMS subscale. The SWAI nocturnal sleep was predicted to detect insomnia. However, it did not discriminate between the groups, and did not correlate with other predicted measures of insomnia. The ability of the SWAI to differentiate between insomnia and other sleep disorders is unknown. The SII is the briefest of these measures and may be sufficient for identification of insomnia in primary care settings. The DBAS, although requiring responses to 30 items, provides useful information relevant to intervention into the insomnia complaint. In addition, it contains sufficient items to increase the probability of test reliability. This scale may therefore be most useful in a sleep clinic environment.
These results are the first, albeit preliminary, comparative investigation of these measures. Replication, with increased sample size, is required to further investigate the properties of these measures. Sampling individuals from a wider range of age and health status groups would increase the clinical utility of these measures, as would demonstrating the ability of these measures to discriminate between patients with insomnia and those with other disorders of sleep. This study offers guidelines as to the optimal use of test scores from these measures to screen young adults suspected of insomnia.
This work was supported by a grant from the Australian Research Council to Prof. John Trinder, and by a Melbourne Research Award to Simon Smith.