The subjective–objective mismatch in sleep perception among those with insomnia and sleep apnea

Authors


Correspondence

Matt T. Bianchi, Wang 7 Neurology, Massachusetts General Hospital, 55 Fruit Street, Boston, MA 02114, USA.

Tel.: 617-724-7426;

fax: 617-724-6513;

e-mail: mtbianchi@partners.org

Summary

The diagnosis and management of insomnia relies primarily on clinical history. However, patient self-report of sleep–wake times may not agree with objective measurements. We hypothesized that those with shallow or fragmented sleep would under-report sleep quantity, and that this might account for some of the mismatch. We compared objective and subjective sleep–wake times for 277 patients who underwent diagnostic polysomnography. The group included those with insomnia symptoms (= 92), obstructive sleep apnea (n = 66) or both (= 119). Mismatch of wake duration was context dependent: all three groups overestimated sleep latency but underestimated wakefulness after sleep onset. The insomnia group underestimated total sleep time by a median of 81 min. However, contrary to our hypothesis, measures of fragmentation (N1, arousal index, sleep efficiency, etc.) did not correlate with the subjective sleep duration estimates. To unmask a potential relationship between sleep architecture and subjective duration, we tested three hypotheses: N1 is perceived as wake; sleep bouts under 10 min are perceived as wake; or N1 and N2 are perceived in a weighted fashion. None of these hypotheses exposed a match between subjective and objective sleep duration. We show only modest performance of a Naïve Bayes Classifier algorithm for predicting mismatch using clinical and polysomnographic variables. Subjective–objective mismatch is common in patients reporting insomnia symptoms. We conclude that mismatch was not attributable to commonly measured polysomnographic measures of fragmentation. Further insight is needed into the complex relationships between subjective perception of sleep and conventional, objective measurements.

Introduction

Insomnia is broadly defined by both the International Classification of Sleep Disorders and the Diagnostic and Statistical Manual of Mental Disorders-IV as encompassing not only difficulty initiating or maintaining sleep, but also the experience of non-refreshing sleep not explained by another sleep disorder. According to these current conventions, insomnia is a clinical diagnosis without objective requirements, and laboratory polysomnography (PSG) is not generally required during routine evaluations unless another disorder is suspected.

While the diagnosis of insomnia is based on self-reported symptoms alone, when there are PSG data available, a striking finding is often present: that the subjective report of sleep–wake time does not coincide with the objective PSG data. Mismatch between the subjective and objective sleep–wake duration has been reported in people with insomnia (Bonnet and Arand, 1997; Edinger and Fins, 1995; Edinger and Krystal, 2003; Fernandez-Mendoza et al., 2011; Frankel et al., 1976; Manconi et al., 2010; Means et al., 2003).

Patients with insomnia – and the physicians who diagnosis and treat them – face the challenge of understanding the disparity between subjective and objective sleep duration. Even when PSG is available to document the extent of the mismatch, whether the phenomenon varies from night to night or by context (lab versus home) remains uncertain. The dilemma includes the diagnostic categorization as well as the treatment plan. For example, should one prescribe a medication for sleep if the patient reports sleeping poorly despite normal PSG findings? In this regard, it is not surprising that patients with insomnia enjoy a subjective benefit of benzodiazepines for sleep duration that exceeds objective improvements in sleep duration (Holbrook et al., 2000), and this phenomenon has also been reported for cognitive behavioral therapy (Epstein et al., 2012).

Several terms have been employed to describe patients with insomnia complaints in the setting of otherwise normal sleep architecture: sleep state misperception; paradoxical insomnia; and subjective insomnia. The estimated prevalence of this form of insomnia, using the strict definition requiring that the complaint occurs in the setting of normal sleep architecture (i.e. normal percentages of sleep stages, sleep latency and sleep efficiency), with >50% overestimation of sleep latency and <50% underestimation of total sleep time (TST), is <5% of patients with insomnia according to the International Classification of Sleep Disorders (ICSD 2005). However, the extent of mismatch between subjective and objective sleep–wake times spans a continuum, and to variable degrees may occur with different subtypes of insomnia as described above, as well as other clinical disorders (Edinger and Krystal, 2003; Vanable et al., 2000).

The etiology of subjective–objective mismatch is likely to be complex and involve multiple factors, as outlined in the comprehensive review by Harvey and Tang (2012). For example, cognitive and psychological components have been proposed, including time perception (but others have shown no impairment of time estimation; Fichten et al., 2005; Rioux et al., 2006), personality characteristics (Bonnet and Arand, 1997; Edinger et al., 2000; Rioux et al., 2006; Rosa and Bonnet, 2000), mood (Bonnet and Arand, 1997; Edinger et al., 2000; Vanable et al., 2000) and memory (Perlis et al., 1997; Wyatt et al., 1997). In addition, factors related to sleep physiology have also been proposed, including physiological arousal (Bonnet and Arand, 2010), alpha-delta sleep (Martinez et al., 2010; although the evidence here is not consistent), cyclic alternating pattern (Parrino et al., 2009), and duration or stage of sleep (Ogilvie, 2001). Interestingly, there is evidence for cognitive distortions in the extent of mismatch when comparing immediate estimations upon morning waking versus retrospective reflection of the 7-day experiment. The immediate impressions were exaggerated (whether over- or underestimated) in the retrospective estimate (Fichten et al., 2005). This finding is particularly relevant to clinical assessment of patients with insomnia: if recall can become biased even after a short time, the reliability of retrospective reports of average sleep difficulty over time scales of months or years would be highly uncertain.

In this study, we analysed 277 patients who underwent clinical PSG in our laboratory and who had self-reported insomnia symptoms, objectively defined obstructive sleep apnea (OSA), or both. We compared the subjective sleep–wake estimates, obtained in the morning following PSG testing, with objective sleep parameters acquired overnight. We predicted that mismatch between subjective and objective measures would be more common among those reporting insomnia symptoms than those with sleep apnea, and that mismatch would be related to objective metrics of light or fragmented sleep. We included patients with sleep apnea because they may also exhibit mismatch even without insomnia complaints, and they also exhibit light and fragmented sleep, which may contribute to mismatch.

Materials and Methods

Patient population

This study was approved by our institution's Human Research Committee. Due to the retrospective nature of this study, consent was not required to investigate data obtained for clinical indications and stored in our database. We included men and women aged 18 years and older who underwent diagnostic laboratory PSG at our institution between March and December of 2009. The clinical indication for PSG was determined by referring physicians, with the majority (> 80%) of subjects being referred by non-sleep specialists. Most of the referrals listed OSA as the reason; in our analysis, we used patient self-reported data regarding insomnia (see below) rather than the referring provider information. Patients typically arrived at approximately 20:00 hours, with study initiation occurring in the range of 22:00–23:00 hours. Studies ended at approximately 06:00 hours. Subjective self-reporting of sleep–wake times was performed at the time the study ends. It is theoretically possible that patients with strong phase advance or delay might experience fragmentation due to sleep in the lab occurring partially out of phase with their habitual sleep times. Although our database does not contain sufficient information to definitively characterize circadian phase disorders, the possibility that some of these patients had a circadian disorder is included in our working hypothesis that fragmentation (of any cause) contributes to mismatch. Based on the pre-sleep inventories, the majority of patients reported habitual bed times between 21:00 hours and midnight (n = 12 reported later bed times between 12:30 and 02:00 hours, and three patients reported earlier bed times of 20:00 or 20:30 hours). Among the 12 reporting later bedtimes, the magnitude of TST underestimation was similar to the greater insomnia group (the early group was too small to make a meaningful assessment).

Certified technicians scored each study according to AASM criteria. There were 904 diagnostic studies performed in the considered time frame for a variety of clinical indications (excluding those involving split-night or full-night continuous positive airway pressure treatment for sleep apnea). All subjects completed a questionnaire prior to initiating sleep, including detailed information about their sleep habits, past medical history, medications and symptoms. Prior to analysis, we excluded patients from this initial dataset who met any of the following pre-specified criteria: presence of neurological disease, such as epilepsy, dementia, multiple sclerosis (based on self report, presence of medications used to treat these diseases or clinic records when available); self-reported psychiatric medications (antidepressant, anxiolytic, antipsychotic, mood stabilizing), prescription hypnotics, immunosuppressant medication, human immunodeficiency virus medication; non-English speaking. Patients taking anticonvulsants for non-epilepsy reasons such as migraine were excluded, although the migraine diagnosis itself was not an exclusion criterion. The number excluded for these reasons was 343. We also excluded subjects for whom either the pre-sleep questionnaire (clinical history) or post-sleep questionnaire (subjective assessment of sleep in the lab) forms were incomplete or missing (= 249).

Of the 312 subjects included for analysis, 277 fell into three pre-defined categories as follows: insomnia symptoms with apnea–hypopnea index (AHI) < 5 and RDI < 15 (= 92), OSA (defined as AHI > 5 or RDI > 15; = 66) alone, or both (OSA + I; = 119). The choice to use an AHI value of 5 is a conservative one, given that the AASM and the ICSD provides for the diagnosis of OSA at this AHI value provided that snoring, sleepiness or related symptoms are reported (Epstein et al., 2009). The remaining 35 had neither self-reported insomnia symptoms nor OSA (most of these were primary snoring), and were not further analysed here. Note that ‘insomnia alone’ means that patients self-reported insomnia symptoms as above, and did not have evidence of OSA based on AHI criteria (this category does not take into account the reason for referral to the sleep lab).

Self-reported data

We utilized self-reported symptoms from our standardized patient forms, administered before every clinical PSG performed in our laboratory, to determine the presence of insomnia symptoms. Patients were classified as having insomnia based on this intake form if they either: (i) selected ‘insomnia’ from a list of common reasons for undergoing PSG; or (ii) indicated >3 awakenings per night, or 1–3 per night if they also checked ‘it takes me a long time to fall back to sleep’; or (iii) selected one or more of ‘difficulty falling asleep’, ‘difficulty staying asleep’ or ‘waking too early’. Patients could thus be classified as having insomnia by a spectrum of possible answers according to our pre-defined criteria. While it is unlikely that anyone in our ‘no insomnia’ group actually has a significant complaint of insomnia, it is possible that some patients categorized as insomnia would not meet formal diagnostic criteria. We did not include duration or severity of insomnia symptoms; nor did we include non-refreshing sleep as an entry criterion for the insomnia category. This group is likely to be heterogeneous with regard to insomnia subclassification into ICSD types. Heterogeneity of insomnia subtypes, not all of which have subjective–objective mismatch, would be predicted to decrease our ability to demonstrate mismatch differences in these groups based on insomnia symptoms.

Statistics

Basic statistics were performed using Prism (GraphPad Software). The distribution of mismatch for latency, number of awakenings and TST were generally non-normally distributed, and thus Kruskal–Wallis anova (with Dunn's multiple comparison post hoc testing) was used for group comparisons. When considering the mismatch between subjective and objective estimates of sleep–wake durations, we utilized the absolute errors, rather than the percent errors, because the latter span several orders of magnitude (which complicates statistical inference, as shown in Supplemental Fig. S1). Correlation analysis, shown in the Supplemental Data S1, was performed with the non-parametric (Spearman) method, because many variables did not pass statistical criteria for normal distribution (not shown). Correlation coefficients are given with P-values without correcting for multiple comparisons. It is clear from the correlation matrix that nested correlations were present, and thus multiple regression was performed using Statistica, including partial correlation analysis. Additional correlation analysis is shown in the Supplemental Data S1.

Naïve Bayes Classifier analysis was performed using the freely available software RapidMiner (http://rapid-i.com/content/view/181/190/lang,en/). This algorithm assumes all features used for prediction (such as age, sex, REM%, etc.) are independent of other feature values. While this assumption may not hold in all settings, this classifier performs unexpectedly well in many settings, can be applied to relatively small datasets compared with other classifiers, and is computationally inexpensive because it reduces multidimensional problems to multiple one-dimensional problems. The algorithm uses a maximum likelihood parameter estimation as well as K-fold cross-validation with stratified sampling. The training set is divided into K equal subsets (we used = 20), from which K−1 of the subsets are used to train the algorithm. The remaining one subset is used to apply the learned classification. This train–test process is repeated such that each subset is classified once using training parameters based on the remaining data. The results are presented as a ‘confusion matrix’ indicating correctly and incorrectly classified subjects, such that sensitivity, specificity and predictive values can be easily calculated.

Results

Characteristics of patient groups

Patients in the insomnia group were assigned on the basis of self-reported insomnia symptoms (see 'Materials and Methods'). Table 1 shows the baseline clinical features of the three patient groups: insomnia symptoms without OSA (‘insomnia alone’); insomnia symptoms with OSA (‘OSA + I’); and OSA alone (‘OSA’). Patients with insomnia alone were younger and had lower body mass index (BMI) values, which is not unexpected as increasing age and BMI are risk factors for OSA. The AHI and RDI values were lower, by definition, in the group with insomnia alone. Note that the OSA severity is biased against higher severity, as our lab institutes a ‘split-night’ criteria for severe cases, but this study used only full-night diagnostic studies. The mean Epworth Sleepiness Scale values were within the normal range in all groups (i.e. ≤10), and not different between groups. This is not surprising, as the Epworth Sleepiness Scale shows only a marginal relationship with OSA presence or severity; for example, in the large Sleep Heart Health Study, even the most severe OSA group had a mean (and upper standard deviation) under 10 (Gottlieb et al., 1999), and further machine learning and information theory methods showed little or no relationship with PSG metrics (Eiseman et al., 2012).

Table 1. Basic clinical characteristics
 Insomnia aloneOSA + IOSA alone
  1. Mean (95% CI of the mean), except for age, which has full range in parentheses.

  2. Significant values in bold.

  3. a

    Insomnia is different from OSA, < 0.05.

  4. b

    Insomnia is different from OSA + I, < 0.05.

  5. c

    OSA + I is different from OSA, < 0.05.

  6. AHI, apnea–hypopnea index; BMI, body mass index; OSA, obstructive sleep apnea; RDI, respiratory disturbance index.

n9211966
Age (years)43.7 (19–84)ab54.9 (22–82)50.8 (24–78)
% Male49 (39–59)63 (54–71)71 (60–82)
BMI28.6 (27.3–30.0)b32.7 (32.4–34.0)31.2 (30.0–32.8)
Epworth Sleepiness Scale8.1 (7.0–9.1)6.9 (6.2–7.7)8.1 (6.9–9.2)
AHI1.5 (1.2–1.8)ab20.4 (17.7–23.1)c16.9 (14.1–19.6)
RDI11.1 (9.1–13.0)ab35.5 (31.9–38.4)33.5 (29.7–37.4)

With regard to PSG metrics, the basic sleep architecture values for each group are given in Table 2. We expected to find elevated metrics of sleep fragmentation in patients with insomnia. Interestingly though, we observed the least objective indication of fragmentation in the group with insomnia alone: they had less N1 and more N2, N3 and REM sleep stages, compared with the OSA and the OSA + I groups. The higher N3 and REM percentages might be related to the younger age of the insomnia group (Ohayon et al., 2004). They also had lower arousal index values and a lower frequency of stage transitions within sleep. Sleep efficiency was similar across groups.

Table 2. PSG parameters
 Insomnia aloneOSA + IOSA alone
  1. Data are mean (95% CI of the mean).

  2. Significant values in bold.

  3. a

    Insomnia is different from OSA, < 0.05.

  4. b

    Insomnia is different from OSA + I, < 0.05.

  5. c

    OSA + I is different from OSA, < 0.05.

  6. AI, arousal index; REM, rapid eye movement; TST, total sleep time; WASO, wake after sleep onset.

TST (min)368 (354–383)b334 (321–346)c383 (369–398)
Efficiency (%)84.4 (81.7–87.1)78.4 (75.9–80.9)88.3 (85.7–90.9)
N1 (min)48.1 (42.9–53.3)a , b68.6 (61.8–75.3)62.2 (51.0–73.5)
N1 (%)14.0 (12.2–15.6)21.6 (19.3–24.0)16.9 (13.5–20.2)
N2 (min)201 (191–211)b179 (169–189)c212 (198–225)
N2 (%)54.6 (52.6–56.5)55.3 (51.3–55.4)54.9 (52.4–57.5)
N3 (min)60.4 (53.5–67.2)b41.5 (35.6–47.4)53.7 (44.8–62.6)
N3 (%)16.2 (14.5–18.0)12.3 (10.6–14.08)14.2 (11.9–16.5)
REM (min)58.8 (52.6–65.0)b44.3 (39.1–49.5)55.5 (48.4–62.7)
REM (%)15.2 (13.8–16.6)12.7 (11.4–14.0)14.0 (12.3–15.8)
REM Lat (min)124 (108–139)157 (140–174)123 (105–141)
WASO (min)65.4 (55.7–75.2)a,b91.8 (81.2–102)c50.9 (39.4–62.4)
WASO (%)24.8 (14.4–35.2)a,b33.4 (27.0–39.8)c15.4 (10.8–20.1)
# Wakes (≥ 30 s)24.1 (21.5–26.7)30.3 (27.2–33.4)22.2 (18.4–25.9)
# Sleep transition74 (69-80)a86 (80–91)97 (86–108)
AI (total)19.0 (17.0-21.0)a,b37.2 (34.0–40.3)35.2 (31.6–38.8)
Periodic limb movement index12.0 (7.6-16.3)16.1 (8.7–23.5)10.5 (5.6–15.4)

The failure to observe increased fragmentation in the insomnia alone group is attributed in part to the presence of apnea-related arousals and fragmentation in the other two groups, a well-understood and expected finding, given that they all had OSA based on AHI > 5. Thus, there was little evidence to support our working hypothesis that patients reporting insomnia symptoms (without co-morbid OSA) suffered from particular deficits in REM or deep non-REM sleep, or from excess N1 sleep, reduced sleep efficiency or increased number of awakenings, at least not in comparison to the degree of sleep architecture fragmentation seen in patients with OSA but no insomnia symptoms.

Subjective–objective mismatch of sleep and wake durations

We compared subjective post-PSG TST estimate with the PSG-based objective TST measurements to determine the extent to which mismatch occurred in each of the three groups (Fig. 1a).

Figure 1.

Group differences in total sleep time (TST), latency to persistent sleep (LPS) and wake after sleep onset (WASO). Box and whisker plots show the median (central line), mean (‘+’), interquartile range (box edges) and 90% confidence intervals (whiskers). Values for TST are given in (a), for LPS in (b), and WASO in (c). The obstructive sleep apnea (OSA), Insomnia, and OSA + I labels in (a) apply to (b) and (c) as well. The brackets indicate significant differences according to anova with post hoc corrections [parametric for (a) and rank non-parametric for (b) and (c)].

Patients with insomnia alone showed significant underestimation of TST; the difference in the median value of the subjective and objective TST values in this group was 81 min (Fig. 1a). A prior study showed mild underestimation (<20 min of ~6 h TST) in patients with OSA (Mccall et al., 1995). This finding is consistent with a large number of prior studies on mismatch mentioned in the Introduction. In contrast, there were no significant differences in the median values for subjective versus objective TST for the other two groups. The objective TST was similar between those with OSA and those with insomnia alone, but was significantly lower in the OSA + I group, by a median of ~43 min. The majority of patients in the insomnia group slept more than 6 h total during the PSG (and only three patients had TST values under 4 h). Thus, patients with severe objective sleep disturbance (insofar as TST may index this) are not well represented, and our findings may not generalize to such individuals.

We also compared the mismatch by first calculating the subjective–objective difference within individuals, then comparing the medians among the three groups. The mean ± SD values at the individual level were 66 ± 87 (SD) min underestimation for the insomnia group, 17 ± 78 min underestimation for the OSA group, and 36 ± 88 min underestimation for the OSA + I group. The insomnia group was significantly greater than the other two groups (Kruskal–Wallis with Dunn's post hoc test, < 0.002).

We note that heterogeneity in the insomnia group would be predicted to decrease the chances of demonstrating mismatch between objective and subjective parameters in this group. For example, co-occurrence of restless legs syndrome (RLS) symptoms might influence the results. However, we found that only six subjects in the insomnia group reported RLS symptoms; this group slept a median of 8% longer and had a median subjective TST that was 15 min shorter than the remainder of the group. Regarding periodic leg movements, the median value was <5 per hour, and 22 subjects had indices over 15 per hour. TST underestimation was not statistically different between subjects in the insomnia group with periodic limb movement index <15 (= 72) versus >15 (= 20; data not shown).

Wake times showed mismatch, but the direction was context dependent (Fig. 1b and c). The sleep latency, defined here as the duration of wake from the time of lights-off until the onset of persistent sleep (10 minutes duration; LPS), was overestimated similarly in all three groups. The median mismatch was 10–20 min, although a long tail of variance extended into more profound overestimations. The objective latency was significantly increased in the insomnia group compared with the OSA group (Fig. 1b). The second context of wake time estimation focused on the duration of wake after sleep onset (WASO). Surprisingly, the amount of WASO was underestimated in all three groups, although there was substantial scatter in the data (Fig. 1c).

In the subsequent sections, we focus our attention on the mismatch of TST for three main reasons. First, whether the clinical complaint involves problems with sleep latency, sleep maintenance or early morning awakening, the final common symptom is the clinical report of insufficient sleep duration. Second, predicting mismatch in TST has straightforward hypotheses linked to sleep architecture, and we will test these directly in the sections to follow. Third, the absolute and percent errors are tightly related and thus analysis is straightforward (see Supplemental Fig. S1). Future analysis involving electroencephalogram (EEG) spectra and stage-transition patterns may shed additional light on mismatch in time spent awake.

Correlates of subjective TST estimation

We next explored possible PSG correlates of subjective TST for the group with insomnia alone. First, we observed a modest but significant correlation between subjective and objective measures of TST (= 0.39; < 0.0002). This correlation, while not unexpected, has statistical implications for investigating correlations between sleep architecture and subjective estimations, as the architecture metrics may be correlated with objective TST. We thus investigated the correlation matrix among subjective TST and various objective metrics from the PSG (Supplemental Fig. S2). We found that several objective features of the PSG correlated positively with the objective TST, such as time spent in each sleep stage, and inversely with other features, such as the number of awakenings and the time spent in wake. Although not unexpected, this finding necessitates caution when exploring PSG correlates of subjective TST, as subjective and objective TST values are significantly correlated.

We thus undertook a multiple regression analysis, with subjective TST as the dependent variable, and the PSG metrics shown in Supplemental Fig. S2 as the independent variables. Initial regression included some of the strongest apparent correlates: objective TST, number of 30-s duration awakenings, and time spent in N2 and REM sleep. This model showed an overall adjusted R2 value of 0.18 (< 0.0003), but none of the individual coefficients was significant. A similar adjusted R2 value (0.16; < 0.001) was found when only time in REM sleep and N2 were included in the equation – and in this case, only the REM coefficient was significant. When the only regression terms were REM time and objective TST, only the TST coefficient was significant in predicting subjective TST. Thus, the PSG correlates may only add a small amount to the predictive value of the model, beyond their inherent correlations with objective TST, which itself correlated only modestly with subjective TST as described above.

Adjusted variants of TST

We next tested the hypothesis that certain stages of sleep, as currently defined, are actually perceived subjectively by patients as wakefulness. For example, time spent in stage N1 may not be consciously processed as sleep (Ogilvie, 2001). As another example, psychophysics experiments suggest that certain simple if-then tasks can be performed with ~75% accuracy during stage N1, and ~25% accuracy during N2 (Ogilvie, 2001). Finally, it is possible that a minimum duration of sleep is required to cognitively register that sleep has occurred (Mercer et al., 2002; Weigand et al., 2007). If true, any one of these hypotheses could account for some of the variance in TST mismatch. Thus, we compared subjective TST against three alternative calculations of ‘functional’ objective TST (Fig. 2).

Figure 2.

Total sleep time (TST) mismatch under baseline and different TST weighting conditions. (a) Subjective TST estimates are plotted against the corresponding objective TST derived from the PSG. In this and subsequent panels, the dotted line represents perfect accuracy of the subjective estimate. (b) The subjective TST estimates are plotted against an adjusted objective TST: the sum of N2, N3 and REM sleep (i.e. ignoring time spent in N1 sleep). (c) The subjective TST estimates are plotted against an adjusted objective TST: N1 time was downscaled by a multiple of 0.25, and N2 was downscaled by a multiple of 0.75. (d) The subjective TST estimates are plotted against an adjusted objective TST: only bouts of sleep lasting at least 10 min (regardless of stage) were included. (e) Box and whisker plots of the TST error in each case (a–d). (f) Cumulative distribution functions of the same data as in (e).

Fig. 2a shows a scatter plot of subjective versus objective TST values in patients with insomnia alone. We then compared the same subjective TST estimates with recalculated objective TST values as follows: without including stage N1 (Fig. 2b); weighted according to graded response accuracy in stages N1 and N2 (Fig. 2c); and requiring a minimum of 10 consecutive minutes of any sleep stage (Fig. 2d). In each case the data points moved leftward in the scatter plots, as expected as each TST recalculation necessarily reduced the TST values. However, the variance of the data points was similar, indicating that the explanatory power was unchanged. We further evaluated the distribution of TST errors under the standard and modified TST calculations using box-and-whisker plots (Fig. 2e) and cumulative probability distributions (Fig. 2f).

In summary, these methods indicate that the subjective–objective mismatch was not improved. In other words, the variance in mismatch is not well explained by any of these factors (ignoring stage N1, weighting of N1 and N2, or minimum sleep duration).

Naïve Bayes Classifier performance

Given that multiple factors likely contribute to subjective–objective mismatch, we attempted mismatch prediction using a classification algorithm. Fig. 3a shows the distribution of TST errors (negative values are underestimates) for the group with insomnia alone (= 92). Although the lack of a bimodal pattern precludes an obvious cut-off choice, the value of 60 min of underestimation of TST is a clinically reasonable starting point and separates the group into = 41 (44.6%) subjects with mismatch. We used 26 variables (Fig. 3) to predict the classification category: mismatch or not. Fig. 3b shows the modest performance of the classifier, with 73.1% sensitivity and 49.0% specificity. These values allow calculation of positive and negative likelihood ratio (LR) values: the LR(+) was 1.4 and the LR(−) was 0.7. Accordingly, due to the LR values being close to 1, the predictive values offered by the classifier reflect only small percentage changes relative to the baseline prevalence of each category. Shifting the cut-off to a more extreme value of 120 min of underestimation yielded similar performance of the algorithm: the LR(+) was 1.5 and the LR(−) was 0.7.

Figure 3.

Naïve Bayes Classifier performance. (a) Histogram or total sleep time (TST) error among those with insomnia only. Negative values represent underestimation. The fraction of the cohort (= 92) exhibiting binned errors (x-axis) is shown on the y-axis. (b) Results of the Naïve Bayes Classifier, using 26 variables in the algorithm (age, sex, BMI, Epworth Sleepiness Scale, spontaneous AI, total AI, periodic limb movement index, AHI, RDI, LPS, # 30-s wakes, # 60-s wakes, TST, WASO, N1,%N1, N2,%N2, N3,%N3, REM,% REM, wake,%wake, # transitions within sleep, # transitions total). The left panel shows results when the cutoff of ‘mismatch’ class was 60 min underestimation, while the right panel shows the results of the algorithm applied when the cutoff of ‘mismatch’ class was 120 min underestimation. NPV, negative predictive value; PPV, positive predictive value.

We then attempted to classify mismatch in the combined group of patients reporting insomnia symptoms regardless of AHI or RDI scores. Similar to the insomnia alone cohort, there is no obvious cut-off value in the distribution of errors in this group (not shown). Using the threshold of 60 min of underestimation, the prevalence of mismatch was 82 out of 211 (38.9%). Classification performance remained poor, with 68.3% sensitivity and 38.8% specificity, yielding an LR(+) of 1.1 and an LR(−) of 0.9. Repeating the classification algorithm, using an adjusted TST that ignored stage N1, improved the classifier performance, but the sensitivity and specificity were still modest and similar to that found in Fig. 3b, yielding an LR(+) of 1.5 and an LR(−) of 0.7.

Discussion

The distinction between sleep and wakefulness has important implications for the diagnosis, treatment and monitoring of sleep disorders. Patient perceptions of their sleep and wake durations may not accord with objective measures by current conventions. It is critical, therefore, to anticipate any potential disparities between subjective and objective measures of sleep or wakefulness. This study offers several observations regarding the mismatch between subjective and objective sleep–wake times, comparing these findings across insomnia or sleep apnea. We dispel common myths that might explain these differences and offer future directions to further explore this important topic.

Our study demonstrates five main findings about subjective–objective mismatch of the perception of sleep or wakefulness at night: (i) Substantial TST mismatch (>80 min underestimation compared with objective measurement) was common in patients who self-report insomnia; (ii) TST mismatch was less apparent in patients with OSA, whether or not they also had insomnia; (iii) TST mismatch was not explained by either shallow sleep (i.e. N1) or short sleep bouts being mistakenly perceived as wake; (iv) Subjective TST correlated modestly with objective TST, but not with other PSG metrics; (v) Perhaps the most surprising, wake time mismatch was context dependent. Specifically, people overestimated the time it took to initially fall asleep, while underestimating WASO. Our findings are consistent with a growing literature documenting subjective–objective mismatch in patients with insomnia. One study that showed an average of 1-h underestimation in chronic insomnia only found it among patients with objective sleep time >6 h (Fernandez-Mendoza et al., 2011). Some studies showed less mismatch than our study (Bonnet and Arand, 1997, 2003; Edinger and Fins, 1995; Martinez et al., 2010; Schneider-Helmert and Kumar, 1995; Vanable et al., 2000), some reports showed mismatch magnitudes similar to our findings (Mercer et al., 2002; Salin-Pascual et al., 1992; Tang and Harvey, 2006), while others showed greater magnitude of underestimation (>3 h; Manconi et al., 2010; Parrino et al., 2009). We do note that the populations studied in each of these studies differed from each other and from our study, which may account for some the variation. In summary, overall, our hypothesis that subjective–objective mismatch is driven by sleep fragmentation was not supported.

Subjective sleep

Total sleep time underestimation was significant only in the group with insomnia who did not have concurrent OSA. The TST mismatch was not attributable to any of three hypotheses that might account for the disparities. Specifically, mismatch was not explained by time spent in stage N1, or by duration of sleep between awakenings, or by an adjustment for possible persistent cognitive processing in stages N1 and N2 (Fig. 2). We observed substantial variance in the extent of mismatch between subjective estimates and objective measurement of TST. Certain objective metrics of sleep architecture showed no correlation with subjective TST, such as slow-wave sleep, arousal index, limb movements or number of transitions within sleep. Other objective metrics that did show correlations on exploratory analysis (Supplemental Fig. S2) did not remain significant in regression models that also included objective TST, which itself had significant correlations with sleep architecture. Consistent with the encountered challenges of predicting mismatch by routine clinical metrics, the Naïve Bayes Classifier performance was modest (Fig. 3).

Subjective wake

Estimation of waking times showed significant mismatch in all groups, and was notably context dependent: the wake time comprising sleep latency was overestimated, while the wake time within the sleep period (WASO) was underestimated. We find it tantalizing to consider the possibility that the process of emerging from sleep within the night alters either the perception of time or encoding of new memory of being awake. (Consider, for instance, anecdotes of people being awoken in the middle of the night by a telephone call, and having little or no memory for the content of that call, even though they were awake for the conversation.) This could be related, for example, to the phenomenon of sleep inertia, as patients transitioning from sleep to wake may not immediately achieve cognitive capacity typical of daytime wakefulness.

The notion that wakefulness immediately adjacent to episodes of sleep might disrupt memory would appear, on first inspection, to counter reports that demonstrate that sleep physiology tends to boost memory (Ellenbogen et al., 2006). However, those studies demonstrate an enhancement of existing memories from the previous day(s), as opposed to acquisition of brand new memories.

Sleep architecture correlates of subjective TST estimation

How people estimate the passage of time while asleep remains poorly understood (Harvey and Tang, 2012). In this study, significant mismatch between subjective and objective TST was only observed in the group reporting insomnia symptoms who did not also have OSA. In this group, subjective TST was positively correlated with objective TST, which was itself correlated with other sleep architecture metrics. Although we observed negative correlations between subjective TST and measures of wakefulness and N1 sleep, these markers of fragmentation were also correlated with objective TST. Multiple regression analysis suggests that if further variance in subjective experience is related to these features, beyond what is explained by TST itself, that the effects are either small or highly heterogeneous.

From a cognitive standpoint, we assessed the extent to which subjects might be guessing at these estimates, which would introduce confounding variance, by requiring a measure of certainty for all estimates. If guessing was random, this would lead to regression to the mean (i.e. no correlation between self-reported certainty and the associated error). However, it is possible that some patients employ a heuristic for estimation that translates their uncertainty into underestimates of sleep time (i.e. ‘I didn't sleep well, so I'm guessing I didn't sleep much’). Although there was some suggestion of pessimism in the estimation of sleep latency in the insomnia group, there was no correlation of the TST error or the WASO error with the certainty of each of these estimates (data not shown).

As mentioned by others (Edinger and Krystal, 2003; Manconi et al., 2010; Vanable et al., 2000), heterogeneity is seen in the distribution of TST mismatch errors: in our study, half of the subjects with insomnia alone underestimated by at least 80 min, but only a small fraction overestimated TST by that amount. It is also worth noting that the absolute magnitude of mismatch may have context-dependent implications. For example, underestimating sleep by 80 min may not be as concerning for the patient already sleeping more than 8 h compared with another patient only sleeping, for example, 4 h. In this regard, Manconi et al. (2010) proposed a metric of mismatch that places the absolute error in the context of the TST value. This index has the added benefit of centralizing the mismatch between 0 and 1 for those who underestimate their TST. In their study, this index correlated highly with the absolute mismatch value (= 0.95), which was used in the current study. The method of characterizing mismatch could impact phenotyping efforts, as the calculation methods may impact the resulting distributions. If treatment strategies are to be tested and eventually targeted based on phenotyping, the methods of phenotyping must be carefully considered.

It is possible that clustering or similar methods of objectively phenotyping subpopulations may lead to further insights into insomnia mechanisms. Clearly, objective sleep measurements in patients with insomnia are characterized by variability (Edinger et al., 1991; Means et al., 2003), which supports the need for ongoing phenotyping efforts. Patients with insomnia also differ by non-sleep clinical characteristics, such as psychiatric and personality scales, which may also be useful for phenotyping (Fernandez-Mendoza et al., 2011). Patients with insomnia may also differ by subjective reports of sleep-related topics: subgrouping has been reported in terms of magnitude of TST underestimation according to self-reported periodic limb movements in sleep, perceived causes of their insomnia, and the presence of both onset and maintenance difficulties (Edinger and Fins, 1995).

Formal cluster analysis using a large dataset of insomnia and normal sleepers revealed 14 subgroups (based on factor analysis of 38 chosen clinical variables), and interestingly these empirically derived clusters did not correlate well with clinical diagnostic phenotyping according to ICSD and psychiatric criteria (Edinger et al., 1996). These findings lend support to the idea that routine clinical phenotyping may be augmented by more formal (i.e. statistical) methods of phenotyping. Another report indicated four clusters of sleep perception patterns in patients with insomnia, where most subjects fell into the accurate group or the stable mild underestimation group, while one smaller subgroup showed marked underestimation, and the final subgroup showed overestimation (Means et al., 2003). More recent use of clustering approaches has revealed patterns of self-reported sleep in subjects with insomnia, in which three groups were evident: an unpredictable pattern; a high probability of insomnia; and a low probability of insomnia (Vallieres et al., 2011). One can imagine considering multiple factors including temporal predictability of subjective and objective sleep, and thus also the degree of mismatch, as well as evidence of homeostatic response after 1–2 ‘bad nights’. Such empiric phenotyping may offer a complementary view of insomnia relative to the current diagnostic subclassifications based on clinical history alone. This type of analysis may benefit from larger populations to ensure adequate sampling of potential subpopulations, as well as repeated measures to determine whether state or trait factors may influence the mismatch. It will be particularly important to include patients with insomnia with a spectrum of objective sleep difficulty (based on metrics such as TST, sleep efficiency); patients with severe objective sleep disturbance were not well represented in the current cohort.

Clinical implications

From a diagnostic standpoint, when restricting case definitions to the specific circumstance of a patient with insomnia complaints who has normal sleep architecture, it has been estimated that so-called paradoxical insomnia has a low prevalence of <5% (ICSD 2005). However, discrepancies between subjective experience and objectively measured sleep may be more common and may overlap with other causes of insomnia (Bonnet and Arand, 1997; Edinger and Fins, 1995; Edinger and Krystal, 2003; Edinger et al., 2000; Fernandez-Mendoza et al., 2011; Frankel et al., 1976; Manconi et al., 2010; Means et al., 2003). Our results are in line with the findings reported in these studies, with respect to underestimation of TST. However, some studies (Manconi et al., 2010) have shown a bimodal distribution of mismatch, with a population with extreme mismatch distinct from those with accurate perception or less severe mismatch. Mismatch may of course be part of the presentation of insomnia in general, most particularly evident in the paradoxical subtype where it is a hallmark and perhaps categorical rather than dimensional feature. However, it is also important to recognize that a continuum of misperception may also present in other forms of insomnia, including the most common subtype of psychophysiological insomnia. Recognition of mismatch has important implications from a diagnostic and therapeutic standpoint. Assessing a patient's subjective experience of sleep in the laboratory setting may provide useful information for the patient and treating providers, by informing the extent to which mismatch might be relevant to that individual.

It is worth noting that subjective reporting of sleep duration may vary with time frame of recollection. One study showed that immediate report (morning diary) for seven consecutive days of objective sleep monitoring differed from a retrospective self-assessment completed at the end of the study in which subjects gave global estimates of the prior week (Fichten et al., 2005). Specifically, the retrospective gestalt estimate showed an exaggeration compared with the average of the underestimations provided each morning for the prior week. One could speculate that this exaggeration reflects a trait of some patients with insomnia. Considering that clinic patients reporting insomnia symptoms are often asked to recollect average or trends in sleep–wake times over weeks or months or longer, such findings as reported by Fichten et al. (2005) serve to emphasize the uncertainty inherent in the clinical assessment of insomnia. When evaluating subjective sleep complaints, sleep diaries may provide some insight into this issue of immediate versus long-term retrospective estimations. Diaries do not, however, directly address the question of mismatch, with the possible exception that non-physiological reports (for example, several days or weeks of little or no sleep with no homeostatic rebound) are more likely to represent mismatch.

From a symptomatic standpoint, there are mixed data regarding the daytime consequences of insomnia (Riedel and Lichstein, 2000). In one interesting study, in which normal subjects were ‘yoked’ to the patterns of fragmentation seen in patients with insomnia, although objective sleepiness and mood disturbance increased, personality scales and subjective sleep perception remained intact (Bonnet and Arand, 1996). Interestingly, it has been shown that poor memory performance and other functional/mood assessments may be independently linked to subjective impression of sleep as well as objective EEG findings (Rosa and Bonnet, 2000). This suggests that identifying and addressing misperception might be of therapeutic benefit.

The extent of mismatch may provide practical information that could direct treatment. A simplified framework could consider whether or not objective sleep abnormalities are present, and within each of these two binary categories, whether subjective estimates were accurate or not. Although each of these axes is likely a continuum, the dichotomous construct may be useful to consider four possible combinations, but even this simplified framework could not be employed without objective measurement. Repeated measures might prove useful to distinguish state versus trait pathophysiology of mismatch.

Comparing the PSG sleep time with the subjective estimates, ideally over multiple nights, could be used to stratify patients with insomnia according to who might respond positively to feedback regarding their objective sleep durations. Perception of sleep may be malleable by non-pharmacological means, as has been recently demonstrated (Tang and Harvey, 2006). Over 40 years ago, it was suggested that one could be trained to more accurately guess the stage of sleep from which one was awoken (Antrobus, 1967), suggesting that feedback about sleep physiology can be quite powerful. In fact, feedback-driven training has been effective in one study of insomniacs (Downey and Bonnet, 1992). It is intriguing that even ‘random’ feedback (unrelated to actual sleep measurements) can shape waking function (Semler and Harvey, 2005), again emphasizing the potential impact of feedback in non-pharmacological insomnia management. A recent small case series (= 4) suggested that providing feedback related to the PSG findings improved mismatch only in the two subjects with fairly normal sleep (Geyer et al., 2011).

One potential reason this feedback strategy has not been widely employed is that multiple nights of PSG are inconvenient and costly. From a management standpoint, it could be that the extent of mismatch influences the types of hypnotic (or even non-pharmacological) treatments that are undertaken. The correlations between subjective TST and sleep architecture are interesting in this context. Our results raise the possibility that competing influences of medications on time spent in these stages may account for some of the variance in clinical responses. For example, benzodiazepine hypnotics may decrease N1 and WASO, and increase N2 while decreasing REM and slow-wave sleep. However, benzodiazepines and possibly other sleep aids may alter either the cognitive estimation of time or memory of the passage of time, and teasing these factors apart is deserving of further study. It has been shown that the subjective response to benzodiazepine hypnotics may exceed objective improvement of sleep time (Holbrook et al., 2000); a portion of the efficacy of benzodiazepines may be related to the amnestic properties. A greater subjective than objective benefit has also been reported, however, for non-pharmacological interventions (Epstein et al., 2012). The basis for disproportionate subjective benefit in cognitive behavioral therapy versus pharmacological treatments remains uncertain. For patients with prominent mismatch, who may be achieving a reasonable number of hours of sleep per night, one would question whether the risks of long-term hypnotic therapy outweighed the arguably non-medical benefit of improving perception alone. From a non-pharmacological standpoint, perhaps with the growing availability of home sleep monitors, feedback-driven management of insomnia may become more commonplace.

Another important treatment consideration relates to longitudinal management of insomnia. It has been suggested that mismatch is a perpetuating factor for those with chronic insomnia (Mercer et al., 2002). This finding raises the important but poorly understood issue of whether mismatch is a ‘trait’ that is difficult to reverse, or whether it is a ‘state’ linked to behaviors or other variables contributing to sleep fragmentation. One could imagine that patients with a trait type of mismatch might find it challenging to wean from hypnotic use; conversely, it is possible that those with a state type of mismatch might be more amenable to behavioral interventions and less likely to need chronic hypnotic treatment.

Finally, the spectrum of mismatch is relevant for interpretation of large epidemiological studies on sleep duration (Bliwise and Young, 2007; Kessler et al., 2011; Phillips and Mannino, 2005). These studies are often, by practical necessity, limited to self-reporting of sleep duration (although, see Vgontzas et al., 2010). Among the concerns regarding interpretation of such studies, the heterogeneity may arise from the confounds of mismatch as well as altered retrospective (compared with immediate) self-reporting (Fichten et al., 2005).

Limitations and future directions

We recognize several limitations that could be addressed in future studies. First, self-reported insomnia complaints were documented by intake form as a routine part of clinical PSG, and thus we lack formal diagnostic characterization. Careful clinical phenotyping as described above (temporal patterns of subjective and objective sleep) may shed light on possible correlations of mismatch with clinically defined insomnia subtypes. Second, because these studies were performed for clinical purposes, we are limited to a single night of sleep in each patient. The question of state versus trait mismatch thus cannot be addressed without repeated measures; further work using multiple nights of laboratory study, or with home sleep monitoring, are needed to address this question. Night-to-night variability in sleep quality and quantity has been suggested in the literature, and the variability itself (or lack thereof) may be an important feature of improved and objective insomnia phenotyping. Third, because our database consists of PSGs performed for clinical purposes, we lack a normal control group against which to compare the sleep architecture values.

Future studies are needed to better understand the subjective perception of time during sleep and wakefulness, and to better quantify objective elements of sleep physiology. Current standards for objective assessment of sleep contain conventions that might not capture those aspects most salient to patients' subjective experience. Advanced signal-processing techniques might enhance this process, providing a more refined, objective appraisal of sleep. Similarly, a better understanding of the psychology of time perception, in both health and disease, might better refine the subjective metric.

Acknowledgements

The authors thank Drs Catherine Chu, Elizabeth Klerman, Andrew Philips and Wei Wang for valuable suggestions, and Karen Gannon for data assistance. Dr Bianchi received funding from the Department of Neurology, Massachusetts General Hospital, a Harvard Catalyst KL2 Medical Research Investigator Fellowship, and the Clinical Investigator Training Program: Harvard/MIT Health Sciences and Technology – Beth Israel Deaconess Medical Center, in collaboration with Pfizer, Inc. and Merck & Co. Dr Ellenbogen received funding from the Department of Neurology, Massachusetts General Hospital.

Conflict of Interest

MTB has a patent pending on a sleep-monitoring device.

Ancillary