Test–retest reliability of the Epworth Sleepiness Scale in clinical trial settings

Summary The present analysis examined the test–retest reliability of the Epworth Sleepiness Scale in participants with excessive daytime sleepiness associated with narcolepsy or obstructive sleep apnea in three clinical trials. Intraclass correlation coefficient estimates for Epworth Sleepiness Scale scores from two solriamfetol 12‐week placebo‐controlled trials (one narcolepsy, one obstructive sleep apnea) and one long‐term open‐label extension trial (narcolepsy or obstructive sleep apnea) were calculated using postbaseline time‐point pairs for the overall population in each trial, by treatment, and by primary obstructive sleep apnea therapy adherence. In the 12‐week narcolepsy trial, intraclass correlation coefficients (95% confidence intervals) were 0.83 (0.79, 0.87) for weeks 4 and 8 (n = 199), 0.87 (0.83, 0.90) for weeks 8 and 12 (n = 196), and 0.81 (0.76, 0.85) for weeks 4 and 12 (n = 196). In the 12‐week obstructive sleep apnea trial, intraclass correlation coefficients (95% confidence intervals) were 0.74 (0.69, 0.78) (n = 416), 0.80 (0.76, 0.83) (n = 405), and 0.74 (0.69, 0.78) (n = 405), respectively. In the open‐label extension trial, intraclass correlation coefficients (95% confidence intervals) were 0.82 (0.79, 0.85) for weeks 14 and 26/27 (n = 495), 0.85 (0.82, 0.87) for weeks 26/27 and 39/40 (n = 463), and 0.78 (0.74, 0.81) for weeks 14 and 39/40 (n = 463). Placebo/solriamfetol treatment or adherence to primary obstructive sleep apnea therapy did not affect reliability. In conclusion, across three large clinical trials of participants with narcolepsy or obstructive sleep apnea, Epworth Sleepiness Scale scores demonstrated a robust acceptable level of test–retest reliability in evaluating treatment response over time.


| INTRODUC TI ON
The Epworth Sleepiness Scale (ESS) is a patient-reported questionnaire that measures excessive daytime sleepiness (EDS) by assessing situational sleep propensities (Johns, 1991). Specifically, the ESS is composed of eight items that assess the likelihood of falling asleep in real-world situations, such as reading, watching television, or driving. Each item is scored from zero to 3, for a total score of zero to 24, with higher scores indicating a greater severity of EDS. Scores of ≤10 are commonly considered within the normal range, whereas scores of >10 indicate pathological EDS (Johns & Hocking, 1997;Johns, 1991). Data from patients with obstructive sleep apnea (OSA) suggest that changes of 2-3 points may be considered the minimum clinically important difference on the ESS (Crook et al., 2019;Patel et al., 2018).
The ESS is widely used in sleep research, clinical trials, and clinical practice. In clinical trials, the ESS is often used to evaluate the effects of treatment intervention on EDS in several disease states, including narcolepsy and OSA. For instance, the efficacy and regulatory approvals of several wake-promoting agents, such as solriamfetol, modafinil, armodafinil, and pitolisant, has been supported by reductions (improvements) in ESS scores (Black & Hirshkowitz, 2005;Black et al., 2010;Dauvilliers et al., 2013Dauvilliers et al., , 2020Harsh et al., 2006;Malhotra et al., 2020;Schweitzer et al., 2019;Szakacs et al., 2017;Thorpy et al., 2019;US Modafinil in Narcolepsy Multicenter Study Group, 2000). Despite this common utility, there are few studies that have evaluated the test-retest reliability of the ESS (Kendzerska et al., 2014).
The test-retest reliability of the ESS has primarily been investigated in healthy, community-based samples (Ahmed et al., 2014;Johns, 1992;Knutson et al., 2006). A few studies (Campbell et al., 2018;Lee et al., 2020;Nguyen et al., 2006;Rozgonyi et al., 2021;Taylor et al., 2019;Walker et al., 2020) have evaluated the reliability of the ESS in sleep clinic patients with suspected sleep disorders by retrospective chart review with conflicting results. In real-world clinic settings, multiple variables could change between assessments, such as the setting of the assessment (primary care setting versus sleep specialist setting), treatment interventions, caffeine use, and concomitant medications. Such factors could impact EDS and lead to greater variability in the ESS scores. These settings are not ideal for evaluating the test-retest reliability of the ESS in relation to its use as an outcome measure in a clinical trial setting. Therefore, it is necessary to examine the test-retest reliability of the ESS within a controlled clinical trial setting, in which multiple factors, such as those previously noted, would be uniform. However, there is a paucity of data ascertaining whether the ESS is reliable in a clinical trial setting and, furthermore, whether the reliability is observed in patients with sleep disorders other than suspected OSA. Solriamfetol, a dopamine and noradrenaline re-uptake inhibitor, is approved in the United States and European Union to improve wakefulness in adult patients with EDS associated with narcolepsy (75-150 mg/day) or OSA (37.5-150 mg/day) (Sunosi™ (solriamfetol) tablets Prescribing Information, 2019; Sunosi™ (solriamfetol) tablets Summary of Product Characteristics, 2020). In two randomised, placebo-controlled, phase III trials and one long-term open-label extension (OLE) trial evaluating the effects of solriamfetol in participants with EDS associated with narcolepsy or OSA, ESS scores were included as a primary or co-primary outcome measure (Malhotra et al., 2020;Schweitzer et al., 2019;Thorpy et al., 2019). The large sample sizes and structured nature of these studies provided an opportunity to assess the test-retest reliability of the ESS in a clinical sample in a clinical trial setting.
The aim of the present analysis was to examine the test-retest reliability of the ESS in participants with narcolepsy or OSA in a clinical trial setting, using the intraclass correlation coefficient (ICC) method (US Department of Health and Human Services, 2009).
In addition, this analysis evaluated whether certain factors that might affect EDS, including treatment (with placebo or solriamfetol) and adherence to primary OSA therapy (i.e. adherence or nonadherence), impact the reliability of the ESS.

| Study design
The present analysis includes data from phase III research from the solriamfetol clinical trial programme. This included two 12-week, randomised, double-blind, placebo-controlled, phase III clinical tri- All studies were approved by institutional review boards or ethics committees at each institution and were performed in accordance with the Declaration of Helsinki. All participants provided written informed consent. Complete descriptions of the study methods and primary results have been published previously (Malhotra et al., 2020;Schweitzer et al., 2019;Thorpy et al., 2019) and the methods are briefly summarised below.

| Participants
For the 12-week trials, eligible participants were adults (aged 18-75 years) diagnosed with narcolepsy (Type 1 or Type 2) or OSA and with ESS scores of ≥10. Additional key inclusion criteria included baseline mean sleep latency <25 min (narcolepsy) or <30 min (OSA) on the Maintenance of Wakefulness Test (MWT), and usual nightly total sleep time of ≥6 hr. Additional inclusion criteria were that participants with OSA were required to have current or history of prior (or attempted) use of a primary OSA therapy (i.e. to treat the underlying airway obstruction), including positive airway pressure, mandibular advance device, or surgical intervention. Participants without current primary OSA therapy use or a history of a successful surgical intervention to treat the underlying obstruction were required to have tried to use a primary OSA therapy for ≥1 month, with at least one documented adjustment to the therapy. At study entry, participants were instructed to maintain the same level of use of primary OSA therapy throughout the study. Key exclusion criteria included usual bedtime later than 1:00 a.m., night-time or variable shift work, or any other clinically relevant medical, behavioural, or psychiatric disorder associated with EDS. Concomitant treatment with other medications that may affect the evaluation of EDS was not permitted.
For the OLE trial, participants with narcolepsy or OSA who had previously completed a phase II or phase III clinical trial of solriamfetol were eligible (Bogan et al., 2015;Ruoff et al., 2016;Schweitzer et al., 2019;Strollo et al., 2019;Thorpy et al., 2019).
Due to differences in time between prior study completion and enrolment in the OLE trial, there were two groups. Group A enrolled in the OLE trial immediately after completion of one of the 12-week phase III trials. Group B historically completed one of several other solriamfetol studies and was subsequently enrolled in the OLE trial.

| Treatment
In the 12-week trials, participants were randomised to receive placebo or solriamfetol 37.5 (OSA only), 75, 150, or 300 mg once daily for 12 weeks. In the OLE trial, solriamfetol treatment was initiated at 75 mg and titrated to 75, 150, or 300 mg during a 2-week titration phase. The titration phase was followed by an open-label maintenance phase (75, 150, or 300 mg), with a total study duration of 40 weeks (Group A) or 52 weeks (Group B).

| ESS assessments
In all three trials, the ESS was administered at the investigative sites (sleep clinics) for all assessments. In the 12-week trials, the ESS was assessed at baseline and at weeks 1, 4, 8, and 12. In the OLE trial, the ESS was assessed at baseline and at weeks 2, 14, 27, and 40 (Group A) or at weeks 2, 14, 26, 39, and 52 (Group B). Participants were instructed to complete the ESS based on the level of sleepiness they experienced over the past week (7-day recall period) (Broderick et al., 2013;Plazzi et al., 2018).

| Statistical analysis
For the 12-week trials, data were analysed for the modified intentto-treat (mITT) population, which was used for the primary efficacy analyses in these studies (Schweitzer et al., 2019;Thorpy et al., 2019) and was defined as participants who received one or more doses of study medication and had baseline and one or more postbaseline assessments. For the OLE trial, data were analysed for the safety population, defined as participants who received one or more doses of solriamfetol. The ICC estimates for ESS scores were calculated using postbaseline time-point pairs. In the 12-week trials, the time-point pairs were weeks 4 and 8, weeks 8 and 12, and weeks 4 and 12. In the OLE trial, the time-point pairs were weeks 14 and 26/27, weeks 26/27 and 39/40, and weeks 14 and 39/40 (week 52 was not included, as only Group B had data at this time-point). All analyses included participants who had data at both visits for each time-point pair.
The ICC estimates and 95% confidence intervals (CIs) were calculated for the overall population in each of the three trials and for the populations of the two 12-week trials combined. For each 12-week trial, the ICC estimates were also calculated by treatment (placebo or combined solriamfetol [all doses]). For participants with OSA (the full population of the 12-week OSA trial and the OSA subgroup of the OLE trial), ICC estimates were calculated by adherence or nonadherence to primary OSA therapy. Participants were categorised as adherent to primary OSA therapy if they had use of positive airway pressure therapy for ≥4 hr/night on ≥70% of nights, use of an oral appliance on ≥70% of nights, or receipt of an effective surgical intervention. Participants were categorised as non-adherent if they had device use at a frequency/duration less than that described above, no use of a device at all, or a surgical intervention deemed no longer effective.
The ICC estimates and 95% CIs were calculated for each subsample using a two-way mixed-effects model, according to the method of Shrout and Fleiss (Shrout & Fleiss, 1979).

| Participant demographics
Across all three trials, the majority of participants were White, not Hispanic or Latino, and primarily enrolled at sites in North America.
In the 12-week trial in participants with narcolepsy (mITT population), the majority of participants were female, mean age was ~36 years, and mean body mass index (BMI) was ~28 kg/m 2 (Table 1).
In the 12-week trial of OSA (mITT population), the majority of participants were male, the mean age was ~54 years, and the mean BMI was ~33 kg/m 2 (Table 1). In the OLE trial (safety population), 52% of participants were male, the mean age was ~49 years, and the mean BMI was ~32 kg/m 2 (baseline data for the OLE trial have been previously reported (Malhotra et al., 2020)).

| Test-retest reliability of ESS scores in the 12week and 1-year trials (pooled data)
In the overall study populations, ICC estimates (95% CIs) ranged

| Test-retest reliability of ESS scores in the 12week trials (by indication)
In the individual 12-week trials, the ICC estimates (95% CI) ranged

| Test-retest reliability of the ESS scores in the 12-week trials (by indication and treatment)
In the 12-week trial in participants with narcolepsy, the ICC esti-

12-week study -OSA 12-week study -Narcolepsy OLE
Lee et al. (Lee et al., 2020)  was 0.75 (scores of <0.9 indicate poor reliability) for pairs of ESS assessments that were an hour apart. Walker et al. (Walker et al., 2020) also found variability in individual ESS scores but found substantial agreement when the ESS was analysed in a binary fashion (i.e. sleepy or normal) using a ESS score cut-off of ≥11; 89% of patients with ESS scores of ≥11 at the first assessment also had ESS scores of ≥11 at the second assessment (up to 90 days later).
Several factors may account for the discrepancy in these findings.
First, the methods for assessing test-retest reliability differed among studies. Notably, only one (Lee et al., 2020) used the ICC method, which was selected for use in the present study because it is an established statistical method for evaluation of test-retest reliability. Other methods, such as a naïve correlation analysis, are not sensitive to systematic differences in repeated measures. The present analysis evaluated data from prospective clinical trials, whereas most other studies retrospectively analysed data from chart reviews in clinical practice (Campbell et al., 2018;Lee et al., 2020;Nguyen et al., 2006;Taylor et al., 2019;Walker et al., 2020).
Further, in the retrospective chart review analyses, there was no control of, or means of assessing, other factors that may have changed between the first and second assessments and impacted intra-participant variability of EDS (e.g. change in total sleep time, medication, or caffeine use). Finally, there was variability in how the test was administered or completed. In many cases, the first and second ESS assessments were administered in different settings (primary care versus specialist) (Campbell et al., 2018;Lee et al., 2020;Nguyen et al., 2006;Taylor et al., 2019;Walker et al., 2020). Indeed, Taylor et al. (Taylor et al., 2019) found low ICC estimates (0.31-0.34) for assessments that were administered in different clinical settings; however, when both assessments occurred in the same setting, the ICC estimate was much higher at 0.82.
The present analysis also found that the ESS has acceptable testretest reliability in participants with narcolepsy. This finding is consistent with a previous study that analysed data from a randomised,

ACK N OWLED G M ENTS
Under the direction of the authors, Hannah Ritchie, PhD, and Jeannette Fee of Peloton Advantage, LLC, an OPEN Health company, provided medical writing and editorial support for this article, which was funded by Jazz Pharmaceuticals.

AUTH O R CO NTR I B UTI O N S
All authors contributed to the conception and design of the study.
DM conducted the statistical analyses. All authors contributed to the interpretation of the results and critical revision of the manuscript for important intellectual content, and approved the final version of the manuscript. shares of Jazz Pharmaceuticals plc. JB is a part-time employee of Jazz Pharmaceuticals and shareholder of Jazz Pharmaceuticals plc.

DATA AVA I L A B I L I T Y S TAT E M E N T
All relevant data are provided with the manuscript and supporting files.