Measures of sleep in rheumatologic diseases: Epworth Sleepiness Scale (ESS), Functional Outcome of Sleep Questionnaire (FOSQ), Insomnia Severity Index (ISI), and Pittsburgh Sleep Quality Index (PSQI)

Authors

  • Theodore A. Omachi

    Corresponding author
    1. University of California, San Francisco
    • Division of Pulmonary, Critical Care, and Sleep Medicine, University of California, San Francisco, Sleep Disorders Center, 2330 Post Street, Suite 420, San Francisco, CA 94115
    Search for more papers by this author

INTRODUCTION

Fatigue is a major symptom associated with rheumatologic diseases such as systemic lupus erythematosus and rheumatoid arthritis and may be a direct manifestation of disease activity; however, such fatigue may also be related to sleep disturbances (1, 2). Indeed, sleep disturbances are common in a variety of rheumatologic diseases (3–5). Such disturbed sleep may be due to pain, depression, lack of exercise, or corticosteroid usage (6–8). Sleep quality may also be impaired by comorbid sleep disorders, such as obstructive sleep apnea or restless legs syndrome, the prevalences of which are reported to be high in rheumatologic populations (9–12). Sleep disturbances may, in turn, impact functional disability, lower pain thresholds, or impair immune function and therefore contribute to rheumatologic-associated morbidities (13–15). Sleep disturbances in fibromyalgia and rheumatoid arthritis have received relatively more attention than in other rheumatologic disease; however, even in fibromyalgia and rheumatoid arthritis, there are many unanswered questions related to the causes and outcomes of sleep disturbances (3).

The study of sleep disturbances can be onerous because gold standard direct tests, such as polysomnography and multiple sleep latency testing, are both expensive and require considerable commitment of time from research subjects. Laboratory-based sleep studies may present an additional challenge in rheumatologic populations in whom mobility restriction and pain may significantly increase subject burden. Therefore, there is strong impetus for utilizing patient-reported measures in assessing sleep and sleep-related outcomes in rheumatologic diseases.

Four patient-reported measures are discussed in this section, each of which captures a different sleep-related domain and has been extensively utilized in a variety of populations: 1) the Epworth Sleepiness Scale, which assesses daytime sleepiness, 2) the Functional Outcome of Sleep Questionnaire, which assesses sleep-related quality of life, 3) the Insomnia Severity Index, which measures the subjective symptoms and consequences of difficulties initiating and maintaining sleep, and 4) the Pittsburgh Sleep Quality Index, which more generally assesses perceived sleep quality. Please note that the Medical Outcomes Study Sleep Scale, a global measure of sleep quality and sleep-related outcomes, is discussed separately in the Fibromyalgia Section of this issue. None of the scales reviewed here were developed specifically for rheumatologic or musculoskeletal conditions and, indeed, each has relied heavily on populations with primary sleep disorders for validation. To varying extents, as discussed below, each of these measures has been used in rheumatologic populations. Nonetheless, clinicians and researchers must carefully consider their objectives and the appropriateness of their populations in selecting a sleep questionnaire to meet their needs.

EPWORTH SLEEPINESS SCALE (ESS)

Description

Purpose.

To measure daytime sleepiness (16).

Content.

The ESS is intended to measure the single factor of “somnoficity.” The instrument asks subjects to rate “in recent times” how likely they would be to “doze off or fall asleep” in 8 different common situations of daily living, such as “sitting and reading” or “watching TV.” The ESS asks respondents to “try to work out how they would have affected you,” even if they have not done a given activity recently.

Number of items.

8 items.

Response options/scale.

The questionnaire has a 4-point Likert response format (0 = would never doze, 1 = slight chance of dozing, 2 = moderate chance of dozing, and 3 = high chance of dozing).

Recall period for items.

“Recent times.” Further specificity is not provided.

Endorsements.

No.

Examples of use.

The ESS has been used frequently in studies of obstructive sleep apnea (OSA), but has also been applied to study sleepiness related to Parkinson's disease (17), multiple sclerosis (18), asthma (19), gastroesphogeal reflux (20), and multiple other chronic diseases. Its usage in the rheumatologic literature has been more limited than in primary sleep disorders, but it has been applied in examining the effects of chronic pain on sleepiness (21, 22).

Practical Application

How to obtain.

The survey instrument is available in the original validating publication (16), and is also available at http://epworthsleepinessscale.com. An annual license fee may be applicable if usage is “deemed commercial in nature.” Permission to use can be obtained from Murray W. Johns, PhD, who can be contacted through the above web site or at Epworth Sleep Centre, Melbourne, Victoria, Australia. E-mail: mjohns@optalert.com.

Method of administration.

Written survey instrument.

Scoring.

The 8 Likert response items are summed to calculate a total score.

Score interpretation.

Score range is 0–24, with higher scores indicating greater daytime sleepiness. Scores ≥11 are generally considered abnormal, or positive for excessive daytime sleepiness (EDS). This criteria for EDS was based on a mean ± SD score of 4.5 ± 2.8 among 72 healthy Australian workers (23).

Respondent burden.

2–3 minutes.

Administration burden.

Time to score is <1 minute.

Translations/adaptations.

The ESS has been translated and validated in multiple languages, including Spanish, German, Mandarin Chinese, Turkish, and Greek (24–28).

Psychometric Information

Method of development.

The 8 situations assessed for likelihood of falling asleep were selected based on earlier research regarding low-stimulating environments that were likely to be soporific (29).

Acceptability.

Item-response rates are reported to be high, with Johns and Hocking reporting <1% of surveys having missing data (23). In a recent study, score distributions were reasonably normal among community-dwelling US adults, with a mean ± SD score of 8.2 ± 3.9 (30).

Reliability.

There was adequate internal consistency with Cronbach's alpha (range α = 0.74–0.88) (31, 32). Test–retest reliability was reported to be high based on testing separated in time by 5 months in healthy subjects (r = 0.82, P < 0.001) (31). In subjects with OSA, with testing separated by an average of 71 days, r = 0.73 (P < 0.001) (33).

Validity.

Concurrent validity of the ESS has been assessed as its correlation with mean sleep latency on multiple sleep latency tests (MSLT) in which subjects are asked to take a series of brief naps over the course of several hours. In such studies, the ESS showed correlations in the expected directions of between 0.30 and 0.37 (34, 35). Although this correlation is not exceptionally high, the validity of the ESS has also been argued based on evidence that it predicts, better than MSLT, the presence of narcolepsy, a condition which is by definition associated with excessive daytime somnolence (36). The validity of the ESS has also been established based on its association with the Respiratory Disturbance Index among OSA patients, and its responsiveness to treatment in OSA (16, 31).

Ability to detect change.

Based on results from clinical trials, the ESS is sensitive to change, with therapies thought to reduce sleepiness, showing improvements in ESS (17, 18, 37). Minimally clinical important differences are not reported.

Discussion

The ESS is one of the most widely used measures, both clinically and in sleep medicine research, with the original validation article having been referenced more than 3,000 times in peer-reviewed publications. Its attractiveness is based in part on its ease of administration, as well as the simplicity of the concept it is measuring, daytime sleepiness. Although the MSLT is considered by many to be the gold-standard for measuring sleepiness (34), it is often not practical for research or clinical purposes. By specifically asking about the likelihood of falling asleep in various situations, rather than the effects of sleepiness on daily activities, the ESS may hold some theoretical advantages in distinguishing fatigue from sleepiness, where fatigue is defined as a subjective lack of physical or mental energy to carry out desired activities (38). This may be important in rheumatologic diseases, which might be expected to cause significant fatigue independent of sleepiness, although the application of the ESS to rheumatologic conditions has been relatively limited, and validation of this distinction has not been established. An additional caution is that the ESS cannot distinguish between sleepiness as a result of disturbed sleep and sleepiness resulting from other causes, such as medication effects.

FUNCTIONAL OUTCOMES OF SLEEP QUESTIONNAIRE (FOSQ)

Description

Purpose.

To assess the impact of excessive sleepiness on functional outcomes relevant to daily behaviors and sleep-related quality of life (39).

Content.

The instrument asks subjects if they have had difficulty performing specific activities because of “being sleepy or tired.” It provides instructions to respondents informing them that the words “sleepy” and “tired” mean “the feeling that you can't keep your eyes open, your head is droopy, that you want to ‘nod off,’ or that you feel the urge to take a nap. These words do not refer to the tired or fatigued feeling you may have after you have exercised.”

In 30 items, the FOSQ then assesses difficulty, due to sleepiness, in performing activities of daily living and recreational activities, which are categorized into the following 5 subscales: 1) activity level (9 items), 2) vigilance (7 items), 3) intimacy and sexual relationships (4 items), 4) general productivity (8 items), and 5) social outcomes (2 items). A shorter 10-item version, the FOSQ-10, was published in 2009 using selected items from each subscale and providing the same definition of sleepy and tired (40). Items for the FOSQ-10 are distributed among the same subscales as follows: 1) activity level (3 items), 2) vigilance (3 items), 3) intimacy and sexual relationships (1 item), 4) general productivity (2 items), and 5) social outcomes (1 item). However, because of the limited number of items in each subscale for the FOSQ-10, the authors recommend that only the total score for the FOSQ-10 be utilized, rather than individual subscales.

Number of items.

There are 30 items in the original FOSQ-30, and 10 items in the FOSQ-10.

Response options/scale.

The questionnaire has a 4-point Likert response format (e.g., 1 = extreme difficulty, 2 = moderate difficulty, 3 = a little difficulty, and 4 = no difficulty). A response alternative is also available for respondents to indicate that they do not engage in the activity for reasons other than being sleepy or tired.

Recall period for items.

Not specified. Question stems imply current difficulty.

Endorsements.

No.

Examples of use.

The FOSQ-30 has been used to assess response to therapies in randomized clinical trials (37, 41, 42) or prospective cohort studies (43) and to assess the impact of known or suspected sleep disturbances on daytime function (44–48). For example, Burke et al report that although opioids-dependent individuals reported significant sleep disturbance, such sleep disturbance did not appear to affect daily functioning as assessed by the FOSQ (45). The FOSQ has been applied to a limited extent in populations with rheumatologic disease (49, 50). The FOSQ is frequently used as a measure of sleep-specific health-related quality of life.

Practical Application

How to obtain.

Available from the authors. Permission for use is required. Contact Terri E. Weaver, PhD, RN, University of Illinois at Chicago, 845 South Damen Avenue, MC 802, Chicago, IL 60612. E-mail: teweaver@uic.edu.

Method of administration.

Self-administered written questionnaire.

Scoring.

For both the FOSQ-30 and FOSQ-10, an average score is calculated for each subscale, and the 5 subscales are totaled to produce a total score. Missing responses, and responses from activities in which the respondent does not participate regularly “for reasons other than being sleepy or tired,” are not included in the score calculation (i.e., not included in the calculation of average value for subscales). Therefore, missing responses do not necessarily prevent score calculation. Subscale scores for both the FOSQ-10 and FOSQ-30 range from 1–4 with total scores ranging from 5–20.

Score interpretation.

Score range is 5–20 points, with higher scores indicating better functional status.

Respondent burden.

The FOSQ is written at a fifth-grade reading level. Time to complete the FOSQ-30 is reported to be 15 minutes (39). Time to complete the FOSQ-10 is not reported. Although the FOSQ-10 has one-third the number of questions, it may take longer than one-third of the time of the FOSQ-30 to administer, given that the length of instructions related to defining sleepy and tired are unchanged.

Administration burden.

Time to score is not reported, but is estimated here to be ∼3–5 minutes if done by hand.

Translations/adaptations.

The FOSQ-30 has been translated and validated in peer-reviewed publications in multiple languages including Spanish, German, Turkish, and Norwegian (51–55). Multiple other translated versions of the FOSQ-30, although not specifically validated in peer-reviewed publications, are also available from the authors.

Psychometric Information

Method of development.

Based on Granger's model of disability, 74 items were originally identified and tested in 3 distinct cohorts, consisting largely of participants with either confirmed sleep apnea or those referred to sleep disorders clinics. Forty-four items were then eliminated because 1) a high level of agreement between questions about degree of difficulty and frequency of symptoms lead to elimination of questions about frequency of symptoms, 2) certain items reduced the reliability (Cronbach's alpha) of the subscales and were therefore eliminated, and 3) items which did not meet the loading criterion of >0.40 were eliminated.

Acceptability.

Information on the number of missing items was not reported in original FOSQ development, although a given respondent's total score and subscale scores are not invalidated by missing items. Scores may cluster toward the high-end of the FOSQ range (scores 5–20), especially in populations selected from the community or without sleep complaints. Among older community-dwelling adults, Gooneratne et al report that the mean ± SD FOSQ total score was 19.29 ± 0.67 among subjects without excessive daytime sleepiness (EDS; based on Epworth Sleepiness Scale scores) and was 17.91 ± 2.00 among subjects with EDS (56). Nonresponse may be a problem for questions related to intimacy and sexual activity, since a majority of respondents in that study did not answer these questions (56).

Reliability.

In their original development paper, Weaver et al report a high internal consistency with Cronbach's alpha (α = 0.95) for the 30-item FOSQ, after elimination of items that reduced the Cronbach's alpha (39). For the FOSQ-10, Cronbach's alpha was α = 0.87 (40). Test–retest reliability for the FOSQ-30 was high, based on testing separated by 1 week without interval intervention (r = 0.90).

Validity.

Concurrent validity of the FOSQ-30 was established based on moderate correlation with the Sickness Impact Profile (SIP), a general (not disease-specific) measure of functional status outcomes, and the Short Form 36 (SF-36) health survey. FOSQ subscales generally correlated more highly with related SIP and SF-36 subscales and less with unrelated SIP and SF-36 subscales. Discriminant validity was established based on differences in scores between respondents seeking evaluation for sleep disorders and individuals without sleep complaints (t-test −5.88, P < 0.001) (39).

The FOSQ-10 total score was robustly associated with the FOSQ-30 total score, (r = 0.96, P < 0.0001), explaining 92% of the variance of the longer version. The subscales of the FOSQ-10 and FOSQ-30 were also highly correlated with Pearson's correlation coefficient as r = 0.83–0.97 (P < 0.0001 for all) (40). Scores on the FOSQ-10 were also significantly lower in untreated sleep apnea patients (mean ± SD 12.48 ± 3.23) as compared to controls without sleep disorders (mean ± SD 17.81 ± 3.10) (P < 0.0001), suggesting discriminant validity.

Ability to detect change.

Sensitivity to change has been demonstrated in clinical trials showing improvements in the FOSQ-30 resulting from therapies such as modafinil or positive airway pressure therapy (37, 42). The FOSQ-10 has also shown improvements resulting from positive airway pressure therapy in patients with sleep apnea (40). Minimally clinical important differences are not reported.

Discussion

The FOSQ is a widely used measure of functional status resulting from sleepiness and has been effectively employed as a measure of sleep-related quality of life. It has been applied most often in the context of primary sleep disorders, sleep apnea in particular, but it is not specific for any particular disease. As with the Epworth Sleepiness Scale, the FOSQ cannot distinguish between impairment resulting from disturbed sleep or that due to medications such as opiates. The FOSQ has not specifically been validated in rheumatologic populations or applied widely in cohorts with rheumatologic disease. Nonetheless, investigators intending to determine the extent to which rheumatologic diseases impair HRQOL due to sleepiness or disturbed sleep may find the FOSQ to be a useful outcome, since many other measures of sleep-related HRQOL are specific to sleep apnea or primary sleep disorders (57). One strength of the FOSQ is its inquiry about items related to intimacy and sexual function, a subject area not captured in many instruments. However, nonresponse to these items may present a problem, as indicated in one study (56).

The FOSQ-10, a shorter version of the FOSQ, was published in 2009, and its total score and individual subscales correlated nicely with the FOSQ-30. Further validation and examples of implementation are not yet available, but this may be an appealing version if the FOSQ-30 is not practical because of length.

INSOMNIA SEVERITY INDEX (ISI)

Description

Purpose.

To be a brief self-report instrument measuring self-perception of insomnia symptoms as well as the degree of concerns or distress caused by those symptoms.

Content.

Content of the ISI corresponds in part to the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) diagnostic criteria for insomnia. In a 7-item questionnaire, with 1 item for each of the following categories, the ISI assesses 1) difficulty with sleep onset, 2) difficulty with sleep maintenance, 3) problem with early awakening, 4) satisfaction with sleep pattern, 5) interference with daily functioning as a result of sleep problems, 6) noticeability of sleep problem to others, and 7) degree of distress caused by sleep problem.

Number of items.

7 items.

Response options/scale.

Each item has a 5-point Likert response format.

Recall period for items.

Last 2 weeks.

Endorsements.

No.

Examples of use.

The ISI was developed to be an outcomes measure for insomnia research and has frequently been used as an outcome in clinical trials, both of pharmacologic therapies and behavioral interventions (58–64). It has also been used to identify morbidity and poor outcomes associated with insomnia, including in rheumatologic diseases (65, 66).

Practical Application

How to obtain.

The written questionnaire was published in the original validation study (67). Permission for usage can be obtained from the author. Contact Charles M. Morin, PhD, Université Laval and Centre de recherche Université Laval-Robert Giffard, Québec, Canada. E-mail: cmorin@psy.ulaval.ca.

Method of administration.

Authors report that ISI is available in 3 forms: written questionnaire for self-administration, written questionnaire for significant other administration, and clinician administration. The self-administered version was the primary focus of validation (67), and this review also focuses on that version, except where otherwise noted.

Scoring.

The 7 Likert response items are summed to determine total score.

Score interpretation.

The score range is 0–28 points, with higher scores indicating greater insomnia severity. The suggested guidelines for score interpretation is 0–7 for no clinically significant insomnia, 8–14 for subthreshold insomnia, 15–21 for clinical insomnia (moderate severity), and 22–28 for clinical insomnia (severe). However, empiric validation of these guidelines is required. Savard et al recommend a cut off score of 8 for detection of sleep difficulties, which yielded a sensitivity of 94.7% and a specificity of 47.4% among cancer patients based on a gold standard of the Insomnia Interview Schedule, a semistructured interview based on DSM-IV criteria (68). Recommended cut off scores for other populations have not been well established empirically.

Respondent burden.

Time to complete is <5 minutes.

Administration burden.

Time to score is <1 minute.

Translations/adaptations.

French-Canadian, Spanish, and Chinese versions have been validated (68–70). Only the clinician-administered version was validated in Chinese.

Psychometric Information

Method of development.

Items for the ISI were selected based on DSM-IV and International Classification of Sleep Disorders criteria for insomnia. The ISI was based closely on the Sleep Impairment Index, an earlier measure developed by Morin (71, 72).

Acceptability.

A floor effect may be present in populations with low prevalence of insomnia symptoms. Among French-Canadian cancer patients, the mean ± SD ISI score was 7.3 ± 6.3 (68). However, among patients referred to a sleep clinic for insomnia, scores were less skewed with a mean ± SD score of 15.4 ± 4.2 (67). Among primary-care Chinese-speaking older adults, the mean ± SD score was 10.4 ± 5.2 (70). Information about missing items and educational attainment of subjects was not presented in validation studies (67).

Reliability.

Adequate internal consistency is suggested by a Cronbach's alpha of α = 0.76 at baseline in the original validation study, α = 0.81 among community-dwelling older Chinese patients, and α = 0.90 among French-Canadian cancer patients (67, 68, 70). Savard et al report that among cancer patients, the test–retest reliability (Pearson's correlation coefficient) is r = 0.83 (P < 0.0001) after 1 month, r = 0.77 (P < 0.0001) after 2 months, and r = 0.73 (P < 0.0001) after 3 months (68).

Validity.

Construct validity.

Because the ISI is based on DSM-IV criteria, it has good face validity. A principal component analysis yielded 3 components consistent with diagnostic criteria for insomnia (impact, severity, and satisfaction) that explained 72% of the total variance (67). Among cancer patients, 2 factors corresponding to severity and impact were identified (68).

Concurrent validity.

Bastien and colleagues provided evidence for concurrent validity as correlation between ISI and sleep diary variables, where r = −0.35 (P < 0.05) at baseline for correlation between ISI and sleep efficiency (defined as percentage of time asleep when in bed), as recorded in a sleep diary over a period of 1–2 weeks. Correlation with sleep diary was higher after insomnia treatment, with r = −0.60 (P < 0.05). The ISI was not correlated with sleep efficiency as recorded on polysomnography (PSG) in a sleep laboratory over 3 consecutive nights (r = 0.09, P 0.05), although the ISI sleep onset item was correlated with time to sleep onset as recorded by PSG (r = 0.45, P < 0.05) (67).

Ability to detect change.

Sensitivity to change.

When comparing the change (pretreatment for insomnia versus posttreatment) in the ISI score, the correlation for ISI change was r = −0.37 (P < 0.05) as compared with the change in sleep efficiency recorded by sleep diary, and r = −0.36 (P < 0.01) as compared with change in sleep efficiency recorded in sleep laboratory on PSG (67). In trials of pharmacologic therapies for insomnia, the ISI has also demonstrated sensitivity to change. For example, in a 6-month randomized double-blind trial, the ISI declined among eszopiclone users, from mean ± SD 17.9 ± 4.1 at baseline to 8.3 ± 6.0 at 6 months. In the placebo group, the change in ISI score was mean ± SD 17.8 ± 4.1 at baseline and 12.9 ± 5.7 at 6 months (P < 0.0001 for difference between groups at 6 months).

Minimum clinically important difference (MCID).

An MCID of 6 points has been recommended based on an analysis that demonstrated such an improvement in scores was associated with the following quality anchors: 48% reduction in likelihood of “feeling worn out” at 6 months (from the Short Form 36 Health Survey), 46% less likely to be “able to think clearly” (from the Work Limitations Questionnaire), and 52% less likely to report “feeling fatigued” (from the Fatigue Severity Scale). A 6-point change was equivalent to 1.5 SDs in this study (73).

Discussion

The ISI has high face validity, is a relatively short instrument, and has been used extensively in clinical research. It has been validated in a number of different cohorts, both those referred for insomnia symptoms, as well as cohorts selected outside of sleep referral centers. The suggested guidelines for classifying insomnia require further validation, and based on the research of Savard and colleagues, there does not appear to be a clear threshold above which clinical insomnia can be diagnosed with high certainty but below which it can also be excluded with confidence (68). Moreover, and particularly relevant to research in rheumatologic diseases, the instrument does not distinguish between causes of insomnia, whether psychophysiologic in origin or related to pain or other symptoms from medical comorbidity. Nonetheless, it has been used effectively in populations with comorbid disease, including cohorts with rheumatologic diseases, and is a useful and brief instrument.

PITTSBURGH SLEEP QUALITY INDEX (PSQI)

Description

Purpose.

To measure sleep quality and disturbances over the prior month and to discriminate between “good” and “poor” sleepers (74).

Content.

The PSQI consists of 7 components: subjective sleep quality (1 item), sleep latency (2 items), sleep duration (1 item), habitual sleep efficiency (3 items), sleep disturbances (9 items), use of sleeping medications (1 item), and daytime dysfunction (2 items).

Number of items.

Nineteen items are included in scoring. Five additional items, to be completed by a bed partner, are included in the questionnaire and may be useful for clinical purposes but are not used for scoring.

Response options.

Of the 19 items included in scoring, items 1–4 have free-entry responses asking for usual bedtime and wake up times, number of minutes to fall asleep, and hours slept per night. Items 5–17 have 4-point Likert scale responses relating to frequency of specified sleep problems. Item 18 has a 4-point Likert scale response relating to overall assessment of sleep quality (“very good,” “fairly good,” “fairly bad,” or “very bad”). Item 19 has a 4-point Likert response scale relating to the respondent's overall assessment of “enthusiasm to get things done” (“no problem at all,” “only a very slight problem,” “somewhat of a problem,” or “a very big problem”).

Recall period for items.

Last month.

Endorsements.

No.

Examples of use.

In multiple disease areas, the PSQI has often been used as an outcome in clinical trials of interventions intended to reduce sleep disturbances (75–81). It has been used in clinical trials to define inclusion criteria for poor sleep quality (e.g., participants with PSQI scores >5 were eligible for inclusion) (82). The PSQI has also been used to determine the impact of a particular sleep disturbance, such as nocturnal hypoxemia in chronic obstructive pulmonary disease, on sleep quality (44). The PSQI has been used as an outcome in epidemiologic studies intending to determine risk factors for, or prevalence of, poor sleep quality in various populations, including those with rheumatoid arthritis, chronic pain, fibromyalgia, and chronic opiate usage (22, 83–86).

Practical Application

How to obtain.

Questionnaire and scoring instructions are available in the appendix of the original validating publication (74). Permission for use can be obtained from the author, Daniel J. Buysse, MD, University of Pittsburgh, 3811 O'Hara Street, E-1127, Pittsburgh, PA 15213. E-mail: buyssedj@upmc.edu.

Method of administration.

Self-administered written questionnaire.

Scoring.

Each of the 7 component scores is determined based on scoring algorithms, with the 7 component scores each yielding a score of 0–3. A PSQI global (total) score is obtained by summing each of the 7 component scores. Scoring algorithms for each component involve a mixture of averaging Likert response scores, categorization of free-text responses (e.g., sleep latency of 15–30 minutes = 1 point), and arithmetic determination of sleep efficiency based on free-text responses.

Score interpretation.

Score range is 0–21 points, with higher scores indicating better sleep quality. In the original validation report, a PSQI global score >5 correctly identified 88.5% as “good sleepers” versus “poor sleepers,” with a sensitivity of 89.6% and a specificity of 86.5% (74). However, accuracy has been less high in other populations: 1) a threshold score of 5 was 72% sensitive and 55% specific among Nigerian university students (87), and 2) in a heterogeneous population (most with history of malignancy or renal transplant), a threshold score of 8 appeared more appropriate (88). Among Chinese-speaking patients, a PSQI score >5 was 98% sensitive and 55% specific for insomnia (89).

Respondent burden.

Time to complete is reported to be 5–10 minutes (74).

Administration burden.

Time to score is reported to be 5 minutes (74). Because of the need to integrate various responses and calculate such variables as sleep efficiency, hand-calculation of scores may be somewhat burdensome, but a scoring algorithm can readily be incorporated into statistical programming software or a spreadsheet for automated calculation.

Translations/adaptations.

Validated versions of the PSQI are available in Spanish, French, Japanese, Chinese, Greek, German, Hebrew, Persian, and Arabic (89–98).

Psychometric Information

Method of development.

The PSQI was derived from “clinical intuition and experience with sleep disorder patients; a review of previous sleep quality questionnaires reported in the literature; and clinical experience with the instrument during 18 months of field testing”(74).

Acceptability.

Total scores appear reasonably normal in distribution in both healthy populations and in those with higher frequency of sleep disturbances (74). Buysse et al report that 6.3% of 158 respondents failed to give complete responses to all items and scores could not therefore be calculated. In a validating study among cancer patients, PSQI scores for 21% of respondents could not be calculated due to missing responses. The presence of free-text items is associated with greater nonresponse; the plurality of missing items reported by Beck et al (99) was due to missing free-text responses necessary to calculate sleep efficiency. Interviewer followup after completion of the questionnaire to query about missing items reduced the percentage of scores that could not be calculated to 4.2%.

Reliability.

In the original validating study, the 7 component scores of the PSQI had an overall Cronbach's alpha of α = 0.83, and individual items were strongly correlated with one another, also with α = 0.83 (74). In separate studies with different populations, the Cronbach's alpha scores have been similar (88, 99). Test–retest reliability (Pearson's correlation coefficient) for the global PSQI was 0.85 (P < 0.001) when testing was separated by ∼4 weeks (74). Among German-speaking respondents with insomnia, the test–retest Pearson's correlation coefficients were 0.90 and 0.86, based on testing separated in time by 2 days and mean 45.6 days, respectively (97).

Validity.

Criterion validity.

Based on the gold standard of clinical evaluation, the PSQI distinguished “good sleepers” from “poor sleepers” with reasonable accuracy in its original validation, which was a chief basis for demonstrating initial validity (see Score Interpretation section above) (74).

Concurrent validity.

In the original validation, the sleep latency component of the PSQI was modestly correlated with sleep latency on single-night polysomnography (PSG) (r = 0.33, P < 0.001), and global PSQI scores were also weakly correlated with PSG sleep latency (r = 0.20, P < 0.01). Other correlations with PSG results were, for the most part, not significant (74), and in a recent study, Buysse et al concluded that the PSQI is not likely be useful as a screening measure for PSG sleep abnormalities (30). A variety of other studies have demonstrated PSQI concurrent validity: 1) PSQI component scores were correlated with sleep duration (r = 0.81) and sleep latency (r = 0.71) as assessed by daily sleep diaries among insomnia patients (97), 2) PSQI global scores were correlated with Insomnia Severity Index (r = 0.76) among Arabic-speaking patients (96), and 3) PSQI global scores were correlated with sleep-related items from the Symptoms Experience Report and with sleep-related items from the Centers for Epidemiological Studies Depression Scale (88).

Factor validity.

Based on the original formulation of the PSQI as a measure of sleep quality, Buysse et al suggested that its 7 components be combined into a single factor, the PSQI global score (74). However, in a factor analysis later conducted by Cole et al (including Daniel Buysse, lead author of the original validation study), a 3-factor scoring model provided significantly better fit than the original single-factor model, where the 3 factors are sleep efficiency, perceived sleep quality, and daily disturbances (100). Such a scoring model has not thus far been widely accepted and has not yet been further validated.

Ability to detect change.

The PSQI has demonstrated sensitivity to change by virtue of clinical trial interventions intended to reduce sleep disturbances, which have shown an improvement in PSQI scores, along with concomitant improvement in other sleep-related measures (75–80).

Discussion

The PSQI is a widely used measure of sleep quality that is more global in nature than other measures reviewed here. The PSQI includes elements of daytime dysfunction, captured more specifically in the FOSQ. Three of the 7 PSQI components (sleep latency, sleep duration, and sleep efficiency) are often elicited to identify evidence of insomnia (101). However, unlike the ISI, these 3 components are based largely on free-text numerical responses that are used to quantify these components, whereas the ISI asks, with Likert responses, about perceived respondent difficulties related to these components. The PSQI also includes 1 item inquiring about daytime sleepiness, although Buysse has argued that the PSQI and Epworth Sleepiness Scale correlate weakly with each other (r = 0.16) and measure orthogonal dimensions of sleep-wake symptoms (30). One strength of the PSQI is, therefore, the broad range of its coverage in measuring several aspects of sleep quality and combining these into a global score. One drawback is the potential disagreement about whether the PSQI represents a single factor (100). 1

Table 1. Summary table for sleep measures*
ScalePurpose/contentMethod of administrationRespondent burdenAdministrative burdenScore interpretationReliability evidenceValidity evidenceAbility to detect changeStrengthsCautions
  • *

    ESS = Epworth Sleepiness Scale; EDS = extreme daytime sleepiness; MSLT = multiple sleep latency tests; MCID = minimal clinically important difference; FOSQ = Functional Outcome of Sleep Questionnaire; ADL = activity of daily living; SIP = Sickness Impact Profile; SF-36 = Short Form 36 Health Survey; HRQOL = health-related quality of life; ISI = Insomnia Severity Index; DSM-IV = Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition; PSQI = Pittsburgh Sleep Quality Index; PSG = polysomnography.

  • b

    Cronbach's alpha.

  • c

    Guidelines: 0–7 no insomnia; 8–14 subthreshold insomnia; 15–21 clinical insomnia, 22–28 severe clinical insomnia.

ESSMeasures sleepiness as likelihood of falling asleep in various situationsWritten2–3 minutes<1 minute
  • Range 0–24

  • ≥11 is positive for EDS

  • α = 0.74–0.88†

  • Test–retest reliability 0.82 after 5 months

Concurrent validity based on correlation with MSLT and ability to predict narcolepsy diagnoses
  • Sensitive to change in clinical trials

  • MCID not reported

  • Short

  • Widely used

  • Simple concept

Cannot discern if sleepiness is due to sleep disturbance or other causes, e.g., medications
FOSQMeasure functional impairment, resulting from sleepiness, in ADLs and recreational activitiesWritten15 minutes for FOSQ-303–5 minutes
  • Range 5–20

  • Higher scores = better functional status

  • α = 0.95†

  • Test–retest reliability 0.90 after 1 week

  • Concurrent validity based on correlations with SIP and SF-36 subscales

  • Discriminant validity-based to classify respondents with sleep disorders

  • Sensitive to change in clinical trials

  • MCID not reported

  • Widely used

  • Measures HRQoL related to sleepiness but not specific to any disease

  • Not widely applied in rheumatologic diseases

  • FOSQ-10 is recently introduced shorter version but with limited application so far

  • Questions about sexual function associated with higher nonresponse

ISIMeasure severity of insomnia symptoms as difficulty initiating and maintaining sleep and as perceived consequences of insomniaWritten or clinician-administered<5 minutes<1 minute
  • Range: 0–28

  • Higher scores indicate greater insomnia symptoms

  • Suggested but not validated guidelines‡

  • α = 0.76–0.90†

  • Test–retest reliability 0.83 after 1 month

Concurrent validity based primarily on correlations with sleep diary
  • Sensitive to change in clinical trials

  • MCID proposed as 6 points = 1.5 SDs

  • Short

  • High face validity based on similarity to DSM-IV criteria for insomnia

  • Widely used

Does not elucidate cause of insomnia, whether related to psychological factors, pain, or other symptoms
PSQIMeasure overall sleep quality across multiple dimensions, including insomnia symptoms, functional impairment, sleepiness, and causes of sleep disturbancesWritten5–10 minutes5 minutes
  • Range 0–21

  • Scores >5 indicate poor sleep quality

  • α = 0.83†

  • Test–retest reliability 0.85 after 4 weeks

  • Criterion validity based on 88.5% accuracy in identifying good sleepers vs. poor sleepers

  • Concurrent validity-based correlation with certain PSG variables, sleep diary, and other sleep-related instruments

  • Sensitive to change in clinicaltrials

  • MCID notreported

  • Broad measure of sleep quality capturing multiple dimensions

  • Widely used

  • Potential disagreement about whether PSQI represents a single factor

  • Free-text responses associated with higher nonresponse unless interviewer followup enacted

AUTHOR CONTRIBUTIONS

Dr. Omachi drafted the article, revised it critically for important intellectual content, and approved the final version to be published.

Ancillary