Evaluating functional disability in clinical trials of lisdexamfetamine dimesylate in binge eating disorder using the Sheehan Disability Scale

Abstract Objectives This study examined Sheehan Disability Scale (SDS) performance in binge eating disorder (BED) and explored relationships between SDS and BED outcomes using data from three placebo‐controlled lisdexamfetamine (LDX) studies (two short‐term, dose‐optimized studies and one double‐blind, randomized‐withdrawal study) in adults with Diagnostic and Statistical Manual of Mental Disorders, fourth edition, text revision (DSM‐IV‐TR)–defined BED. Methods Analyses evaluated the psychometric properties of the SDS. Results Confirmatory factor analysis supported a unidimensional total score in the short‐term studies, with internal consistency (Cronbach's α) being 0.878. Total score exhibited good construct validity, with moderate and statistically significant correlations observed with Yale–Brown Obsessive Compulsive Scale modified for binge eating, Binge Eating Scale (BES), and EuroQol Group 5‐Dimension 5‐Level health status index scores. Known‐groups validity analysis for the short‐term studies demonstrated a significantly lower total score at end of study in participants considered “not ill” versus “ill” based on Clinical Global Impressions–Severity scores. SDS total score changes in the short‐term studies were greater in responders than nonresponders based on binge eating abstinence or BES score. In the randomized‐withdrawal study, SDS scores increased relative to baseline to a greater extent in participants randomized to placebo than LDX. Conclusions These analyses support the reliability, validity, and responsiveness to change of the SDS in individuals with BED.


| INTRODUCTION
p < 0.001) and week 9 (correlation coefficients: 0.38-0.79, all p < 0.001), good internal consistency (Cronbach's α of 0.79 at baseline and 0.91 at week 9), and known-groups validity (Coles et al., 2014). The SDS exhibited a single-factor structure and demonstrated strong item-total score correlations (correlation coefficients: 0.77-0.80), good internal consistency (Cronbach's α, 0.89), and known-groups validity in adults diagnosed with bipolar disorder (Arbuckle et al., 2009). This report describes the performance of the SDS in individuals diagnosed with BED and explores the relationships between SDS scores and BED outcomes using data from Phase 3 lisdexamfetamine dimesylate (LDX) clinical studies (Hudson, McElroy, Ferreira-Cornwell, Radewonuk, & Gasior, 2017;McElroy et al., 2016). In two short-term, Phase 3, randomized, placebo-controlled efficacy studies (McElroy et al., 2016), LDX reduced binge eating days/week (primary endpoint) in adults diagnosed with moderate to severe BED and was associated with greater reductions in SDS scores than placebo (D. V. Sheehan et al., 2018). In a maintenance-of-efficacy study, LDX treatment was associated with longer time to relapse (primary endpoint) to binge eating over 6 months compared with placebo (Hudson et al., 2017). The SDS was included as a secondary endpoint in this study, but the findings for the SDS have not been described.

| METHODS
Detailed descriptions of the study designs and participants for these trials have been reported (Hudson et al., 2017;McElroy et al., 2016).
A brief summary is provided here.

Spain, and Sweden] and NCT01718509 [conducted in the United
States and Germany]) and one double-blind, placebo-controlled, maintenance-of-efficacy trial (http://clinicaltrials.gov/ identifier: NCT02009163 [conducted in the United States, Germany, Sweden, Spain, and Canada]) were used for these analyses. Study protocols were approved by ethics committees. Each study was conducted in accordance with the International Council for Harmonization Good Clinical Practice and the principles of the Declaration of Helsinki.
Participants provided written-informed consent before study-related procedures were conducted.
The short-term studies included 3 phases: a 2-week screening phase, a 12-week double-blind phase (4 weeks of dose optimization and 8 weeks of dose maintenance), and follow-up. Participants were randomized 1:1 to receive 12 weeks of dose-optimized LDX (50 or 70 mg) or matching placebo. Treatment began with 30 mg LDX during week 1. At the start of week 2, the LDX dose was increased to 50 mg. During week 3, the LDX dose was increased to 70 mg based on tolerability and clinical need. A single-dose reduction from 70 to 50 mg was allowed during week 3 if tolerability was poor; no additional dose changes were allowed if such a reduction occurred.
During dose maintenance, the optimized LDX dosage was maintained.
No dose changes were permitted beyond week 3; any participant requiring a dose reduction during the maintenance phase was discontinued. A follow-up visit occurred 1 week after the final treatment visit to assess ongoing or new safety/tolerability issues.
The maintenance-of-efficacy study included a 12-week, openlabel dose-optimization phase (4 weeks of dose optimization and 8 weeks of dose maintenance); a 26-week, double-blind, randomizedwithdrawal phase; and a 1-week follow-up phase (Hudson et al., 2017 (open-label baseline in the maintenance-of-efficacy study) and to provide written-informed consent.
Study exclusion criteria included a current diagnosis of anorexia nervosa or bulimia nervosa, current comorbid psychiatric disorder controlled with prohibited medications or uncontrolled with significant symptoms, or condition that may confound study assessments.
Participants were not permitted to receive psychotherapy or weight loss support for BED ≤3 months before screening; have a Montgomery-Åsberg Depression Rating Scale total score ≥18 at screening; or be considered a suicide risk, have previously attempted suicide, or be currently demonstrating active suicidal ideation. Having a history of symptomatic cardiovascular disease, structural cardiac or heart rhythm abnormalities, or moderate or severe hypertension, or average sitting systolic blood pressure >139 mmHg, or average diastolic blood pressure >89mmHg at screening or baseline were also exclusionary.
Participants with a lifetime history of stimulant abuse, a history of substance abuse or dependence within the past 6 months or known or suspected intolerance or hypersensitivity to LDX or related compounds were also excluded.

| Measures
Functional disability across the SDS domains (work/school, social life/ leisure activities, and family life/home responsibilities) was assessed at baseline, week 6, and week 12 in the short-term studies, and at open-label baseline (day 0) and weeks 4, 12/randomized-withdrawal baseline, 16, 20, 24, 28, 32, and 38 Gormally, Black, Daston, & Rardin, 1982), and binge eating frequency based on self-report diary entries. The Y-BOCS-BE, a 10-item clinician-rated scale that assesses the obsessiveness of binge eating thoughts and compulsiveness of binge eating behaviors (Deal et al., 2015), was conducted at baseline and at weeks 4, 8, and 12 in the short-term studies, and at open-label baseline and weeks 4, 12/randomized-withdrawal baseline, 16, 20, 24, 28, 32, and 38 in the maintenance-of-efficacy study. Individual items were scored on a scale from 0 (no symptoms) to 4 (extreme symptoms) and summed to generate a total score (range: 0-40). The BES, which was used only in the short-term studies, is a 16-item self-report questionnaire that assesses the behavioral, affective, and attitudinal components of binge eating (Gormally et al., 1982;Timmerman, 1999).
The BES was assessed at baseline and at weeks 4, 8, and 12. Items were scored on scales ranging from 0 (no binge eating problem) to 3 (severe binge eating problem). Total score ranges from 0 to 46 (Timmerman, 1999), with scores ≤17 indicating little or no binge eating (Marcus, Wing, & Lamparski, 1985). Binge eating days/week was recorded daily in self-report diaries; entries were reviewed with the participant and confirmed by study investigators at each study visit.
Overall BED severity and its improvement over time were assessed with the 7-item, clinician-rated CGI-S and Clinical Global Impressions-Improvement (CGI-I) scales, respectively (Guy, 1976).
The CGI-S rates the severity of a participant's condition (range: 1 [normal, not at all ill] to 7 [among the most extremely ill]). The CGI-I rates improvement in the participant's condition relative to baseline (range: 1 [very much improved] to 7 [very much worse]). The CGI-S was administered at all visits; the CGI-I was administered at all visits except screening and baseline in the short-term studies and except screening and follow-up in the maintenance-of-efficacy study.
Quality of life, which was measured using the EuroQol Group 5-Dimension 5-Level (EQ-5D-5L) scale was assessed at baseline and at weeks 4, 6, 8, 10, and 12 in the short-term efficacy studies and at screening, open-label baseline, and weeks 4, 12/randomizedwithdrawal baseline, and 38 in the maintenance-of-efficacy study.
The EQ-5D-5L is a self-report scale that measures 5 dimensions of quality of life (mobility, self-care, usual activities, pain/discomfort, and anxiety/depression; Herdman et al., 2011;Janssen et al., 2013) using five response levels (no problems to extreme problems); individual dimension responses can be combined to generate a health status index. A visual analogue scale (VAS) is used to record self-rated health, with endpoints recorded as "the best health you can imagine" (score ¼ 100) and "the worst health you can imagine" (score ¼ 0).

| Endpoints
The prespecified efficacy, safety, and tolerability findings from these studies have been published (Hudson et al., 2017;McElroy et al., 2016). These post hoc analyses were conducted in the full analysis set (short-term studies: randomized participants taking ≥1 study drug dose and having ≥1 postbaseline primary efficacy assessment; maintenance-of-efficacy study: randomized participants who took ≥1 study drug dose during the randomized-withdrawal phase and who had ≥1 postrandomization CGI-S assessment). Data from the short-term studies were pooled.
For the short-term studies, data from baseline (binge eating days/week, Y-BOCS-BE, EQ-5D-5L, CGI-S, BES, and SDS), week 6 (SDS and CGI-S), and end of study (EOS) (SDS, BES, CGI-S, and CGI-I) were used. EOS was defined as week 12 for the SDS and BES (only observed cases were used) and as week 12/early termination, with last observation was carried forward if week 12 data were missing, YEE ET AL.

| Data presentation and analyses
Psychometric analyses determined the factor structure, reliability, validity, and responsiveness to treatment of the SDS. Across all analyses, significance level of p < 0.05 was used.
Item-level analyses examined SDS scores and reference mea- were considered acceptable.
Reliability was assessed with internal consistency and test-retest reliability in the short-term studies. Internal consistency was assessed at baseline for SDS total and domain scores, with Cronbach's α values ranging from 0.70 to 0.90 being considered acceptable (Streiner & Norman, 1995). Test-retest reliability examined reproducibility over time under stable clinical conditions. These analyses, which were conducted in participants receiving placebo who had the same CGI-S rating at baseline and week 6, calculated intraclass correlation coefficients and change scores between baseline and week 6. It was hypothesized that SDS scores would not be statistically different, as measured using two-sided paired t-tests between the two time points. Intraclass correlations >0.70 were considered acceptable.

Construct validity between SDS scores and reference measures
at baseline in the short-term studies and at open-label baseline in the maintenance-of-efficacy study was assessed using Pearson's r (correlation strength: small, 0.10; medium, 0.30; large, 0.50; Cohen, 1988). Known-groups validity in the two short-term studies was assessed by stratifying SDS scores at EOS by CGI-S rating. As previously described for the Y-BOCS-BE by Deal et al. (2015), groups were defined as ill (CGI-S rating ≥4) or not ill (CGI-S rating 1-3).
Analysis of covariance (ANCOVA) models controlling for age and sex assessed between-group differences in SDS scores, with post hoc comparisons conducted using Scheffe's test.
To assess responsiveness of the SDS to change in the two short-term studies, SDS total score changes from baseline to EOS were examined in responders versus nonresponders using ANCOVA models that controlled for age, sex, and baseline score. Responders were defined as participants from either treatment group with no binge eating within the last 28 days of the study or with BES scores ≤17 at EOS. Nonresponders were defined as individuals exhibiting binge eating within the last 28 days of the study or with BES scores >17 at EOS.
For SDS responder thresholds in the short-term studies, score changes indicative of treatment response were identified using triangulation of anchor-based and distribution-based methods (Revicki, Hays, Cella, & Sloan, 2008). Criteria for anchor-based methods included (1) a CGI-I rating ≤3 at EOS, (2) a ≥2-point CGI-S decrease from baseline to EOS, (3) abstaining from binge eating at EOS (defined as 0 binge eating days/week for 4 weeks before EOS), (4) ≤2 binge eating events in any week within the month before EOS, (5) abstaining from binge eating at EOS and a ≥2-point CGI-S decrease from baseline to EOS, and (6) having ≤2 binge eating events in any week within the month before EOS and a ≥2-point CGI-S decrease from baseline to EOS. Anchor-based estimates were assessed using Youden's index (sensitivity þ specificity À 1).
Criteria for distribution-based methods included calculating the 0.5 baseline SD value, which is a good approximation of clinically important differences (Norman, Sloan, & Wyrwich, 2003), the 0.5 mean change score SD, and the standard error of measurement.
To assess functional relapse in the maintenance-of-efficacy study, the responder threshold was determined by examining SDS scores at week 12/randomized-withdrawal baseline, week 16, and week 38. To assess functional remission, SDS remission was defined as a total score ≤6 or domain score ≤2 (K. H. Sheehan & Sheehan, 2008;D. V. Sheehan et al., 2011). Remission rates in each group (placebo or LDX) are presented at baseline and week 12 for the short-term studies and at open-label baseline, randomizedwithdrawal baseline (week 12), week 16, and week 38 in the maintenance-of-efficacy study. For the maintenance-of-efficacy study, SDS total score remission rates based on remission status at week 12/randomized-withdrawal baseline are also reported; data by remission status are not reported for the short-term studies because these data are published (D. V. Sheehan et al., 2018). For both of these analyses, descriptive data are reported using observed cases.

| Participants
The analyses included 724 participants from the short-term studies and 267 from the maintenance-of-efficacy study. Demographic and clinical characteristics are summarized in Table 1.

| Reliability
Internal consistency (Cronbach's α) in the short-term efficacy studies at baseline was high for SDS total (0.878) and domain (work/school,  week 6 (all p < 0.0001), and intraclass correlation coefficients were below the acceptable range (>0.70) for SDS total score (0.529) and all domain scores (work/school, 0.635; social life/leisure activities, 0.467; and family life/home responsibilities, 0.393), indicating that testretest reliability in the short-term studies was poor.

| Validity
The SDS total scores exhibited good construct validity at baseline in the short-term studies, with moderate correlations observed for Y-BOCS-BE total score, BES score, and EQ-5D-5L health status index and VAS scores (Table 3). Similar results were observed in the maintenance-of-efficacy study at open-label baseline. Moderate correlations with SDS total score were observed for Y-BOCS-BE total score and EQ-5D-5L index scores, and a low correlation was observed for EQ-5D-5L VAS score. SDS total score did not correlate with the number of binge eating days/week at baseline in the shortterm studies or maintenance-of-efficacy study. Known-groups validity was demonstrated in the short-term studies, as measured by

| Responsiveness to change
The magnitude of SDS total score reductions from baseline to EOS in the short-term studies was significantly greater in responders than nonresponders when response was based on abstinence from binge eating and BES scores (Figure 2).

| Responder threshold
Mean � SD changes in SDS total score from baseline at EOS for anchor-and distribution-based methods in the short-term studies are summarized in Table 4. Mean � SD score changes from baseline to EOS for domain scores ranged from À 2.0 � 2.56 to À 2.4 � 2.63 for work/school, À 2.8 � 2.88 to À 3.3 � 2.92 for social life/leisure activities, and À 2.5 � 2.76 to 2.9 � 2.71 for family life/home responsibilities. For anchor-based cutoffs, Youden's index suggested optimal score reduction cutoffs of 6-7 points for SDS total score, 1.5-2 points for work/school and family life/home responsibilities scores, and 2-3 points for social life/leisure activities scores. Distribution-based estimates were lower, with the 0.5 mean change score SD estimate being 3.76 (0.5 baseline SD ¼ 3.72) for SDS total score (Table 4). The 0.5 mean change score SD estimates were 1.33 (0.5 baseline SD ¼ 1.32) for work/school, 1.47 (0.5 baseline SD ¼ 1.48) for social life/leisure activities, and 1.43 (0.5 baseline SD ¼ 1.36) for family life/home responsibilities. Based on the findings of the anchorbased and distribution-based methods, the threshold for change to responder status was estimated to be ≥4 points for SDS total score and ≥2 points for domain scores for BED.

| Functional relapse and functional remission
At week 12/randomized-withdrawal baseline of the maintenance-ofefficacy study, mean SDS scores were comparable between T A B L E 2 Confirmatory factor loadings for the SDS at baseline a  -7 of 13 participants randomized to placebo or LDX (Figure 3). In participants randomized to placebo, mean � SD total and domain scores increased relative to week 12/randomized-withdrawal baseline at weeks 16 and 38. In contrast, mean � SD total and domain scores were unchanged at

Maintenance-of-efficacy study (N ¼ 267)
week 16 and decreased at week 38 relative to week 12/randomizedwithdrawal baseline in participants randomized to LDX.
In the short-term efficacy studies, <50% of participants met SDS remission criteria at baseline (Figure 4). Remissions rates were roughly comparable with LDX and placebo at baseline, but the percentages of participants meeting remission criteria were greater with LDX than placebo at week 12 (Figure 4). Similarly, <50% of participants met SDS remission criteria at open-label baseline in the maintenance-of-efficacy study ( Figure 5). However, >90% of participants in both treatment groups met SDS remission criteria at week 12/randomized-withdrawal baseline (Figure 5a-5d Abbreviations: CGI-I, Clinical Global Impressions-Improvement; CGI-S, Clinical Global Impressions-Severity; EOS, end of study (defined as week 12 of treatment); ROC, receiver operating characteristic; SD, standard deviation; SDS, Sheehan Disability Scale; YI, Youden's index.
The SDS total score remission rates as a function of week 12/ randomized-withdrawal baseline remission status are summarized in Table 5. Most participants who met remission criteria at week 12/ randomized-withdrawal baseline did not meet remission criteria at open-label baseline but did meet remission criteria at weeks 16 and 38.
None of the participants who did not meet remission criteria at week 12/randomized-withdrawal baseline met remission criteria at openlabel baseline, and most did not meet remission criteria at week 16.

| DISCUSSION
The key findings of these analyses are that the SDS demonstrated good internal consistency (Cronbach's α > 0.70) and validity, was responsive to change, and exhibited stability with continued treatment in adults with BED. Based on anchor-based and distribution-based estimation methods for meaningful change, reductions of ≥4 points for SDS total score and ≥2 points for SDS domain scores were found to represent improvement to "response" status in these LDX clinical trials.
The overall findings of the psychometric analyses were comparable with previous reports in other populations (Arbuckle et al., 2009;Coles et al., 2014;Leon et al., 1992Leon et al., , 1997. The unidimensional factor structure observed in individuals with BED is consistent with previous observations in individuals from a primary care setting (Leon et al., 1997), individuals diagnosed with bipolar disorder (Arbuckle et al., 2009), and individuals diagnosed with panic disorder (Leon et al., 1992). The levels of internal consistency, as measured by Cronbach's α, and of inter-item correlations were also within ranges observed in other published reports (Arbuckle et al., 2009;Coles et al., 2014;Leon et al., 1992Leon et al., , 1997).
In the current study, poor test-retest reliability was observed at baseline and week 6 in the short-term studies when assessing a

Baseline
Week 12  The baseline levels of functional disability as measured by SDS scores in this population were roughly comparable to those observed in the primary care setting and in individuals diagnosed with panic disorder (Leon et al., 1992(Leon et al., , 1997, but were lower compared with individuals diagnosed with ADHD or bipolar disorder (Arbuckle et al., 2009;Coles et al., 2014;Weiss et al., 2012). Based on the mean SDS total score at baseline, the level of functional impairment in this population of individuals with BED was mild to moderate. However, moderate to large effect sizes for the change from baseline to EOS in SDS total score were observed in responders and nonresponders, with the reported effect sizes observed in treatment responders being comparable with those observed in individuals with panic disorder treated with alprazolam (Leon et al., 1992).
Anchor-and distribution-based methods estimated that reductions of ≥4 points for SDS total score and ≥2 points for domain scores represented response in this BED population. These thresholds, which are consistent with the distribution-based values and lower than the anchor-based Youden's index, are similar to previously reported responder definitions (Arbuckle et al., 2009;Coles et al., 2014;K. H. Sheehan & Sheehan, 2008). In individuals diagnosed with bipolar disorder, mean changes considered to be "minimally improved" were estimated to be 6.0 points for total score (domain scores ¼ 1.38-2.34 points), and 0.5 SD was estimated to be 4.05 points for total score (domain scores ¼ 1.41-1.59 points; Arbuckle et al., 2009). In individuals diagnosed with ADHD, the responder thresholds for SDS total score were slightly lower, with anchor-based methods estimating a mean change of 2.53 points and distribution-based methods estimating the 0.5 SD to be 2.75 points (Coles et al., 2014). In the current analyses, large differences in were observed between the anchor-and distribution-based methods. Week 12/RWB 0 0 0 0 9/9 (100) 8/8 (100) 0 0 Week 16 1/9 (11.1) 3/8 (37.5) 8/9 (88.9) 5/8 (62.5) 0 0 Week 38  These data should be considered in light of certain limitations.
First, study participants did not have comorbid illnesses or psychiatric conditions. As individuals with BED are at increased risk of having medical and psychiatric comorbidities that can affect quality of life and functioning (D. V. Sheehan & Herman, 2015), it is not known how these findings would translate to a more heterogeneous population of individuals with BED. Second, substantive floor effects were observed at EOS for SDS total and domain scores. As LDX demonstrated strong treatment effects on multiple study endpoints, the observed floor effects may be partially explained by LDX treatment effects. Third, stability and remission rate findings reported during the randomized-withdrawal phase of the maintenance-of-efficacy study should be interpreted cautiously because the differential relapse rates between treatment groups-32.1% with placebo versus 3.7% with LDX (Hudson et al., 2017)-may have biased the results.
Finally, as noted previously, the poor test-retest reliability observed in these analyses is likely attributable to factors related to the use of a longer test-retest period than is typical used for test-retest assessments and to the use of the CGI-S to define clinical stability.
In conclusion, in adults with moderate to severe BED who participated in LDX clinical trials, the SDS demonstrated good internal consistency and validity, was responsive to change, and exhibited stability with continued LDX treatment. Anchor-based and distribution-based methods estimated that improvement in functional disability to responder status in adults with BED is reflected by a change of ≥4 points on the SDS total score and ≥2 points on the individual domain scores.