The brief (seven-item) eating disorder examination-questionnaire: Evaluation of a non-nested version in men and women

Objective: Several recent studies have examined the psychometric properties of brief measures of eating disorder attitudes based on the Eating Disorder Examination Questionnaire (EDE-Q). A seven-item version (the EDE-Q7) has been proposed but, as yet, has only been investigated by looking at the items when presented as part of the longer EDE-Q (i.e., as a nested version). The current study presented the EDE-Q7 as a standalone instrument and examined factor structure fit and measurement invariance across male and female genders. Methods: University students (244 women; 155 men; 1 did not identify with either gender) completed questionnaires as part of two independent studies. All individuals completed the EDE-Q7 and measures of eating disorder behaviors. In a mixed-gender subsample ( n = 286), measures of depression and eating disorder-specific quality of life were also included. Confirmatory factor analysis of the EDE-Q7 was conducted on males and females independently, in addition to estimates of internal consistency reliability and validity. Measurement invariance was assessed through multigroup confirmatory factor analysis. Results: The EDE-Q7 demonstrated good internal consistency and findings supported measurement invariance by gender. In a mixed-gender subsample, the measure showed positive associations with depression and both eating disorder behaviors and eating disorder-specific quality of life. Discussion: The present study adds to the literature supporting the psychometric properties of the EDE-Q7, extending this to use of the questionnaire as a standalone instrument. Measurement invariance suggests that the measure may be appropriate for college-age men and women, although future studies should establish psychometric properties more fully.


| INTRODUCTION
The self-report Eating Disorder Examination Questionnaire (EDE-Q; Fairburn & Beglin, 1994;Fairburn & Beglin, 2008) has been used across many diverse studies to assess symptoms of an eating disorder (ED). The EDE-Q can be used to produce a summary of both attitudinal (e.g., concerns about weight) and behavioral (e.g., binge eating) symptoms and the measure has been used in clinical and non-clinical populations (e.g., see Carey et al., 2019, for a recent review). Although the EDE-Q has demonstrated acceptable reliability and validity, the originally proposed factor structure for the attitudinal items has "proven difficult to replicate" (Heiss, Boswell, & Hormes, 2018, p. 419; see also Carey et al., 2019). Furthermore, the 28-item EDE-Q may be of limited use in epidemiology research and clinical settings in which response burden is often a concern (Gideon et al., 2016;Kliem et al., 2016).
Several recent studies (conducted in North America, Mexico, and continental Europe) have investigated the psychometric properties of a seven-item version which generates three scales: dietary restraint; shape/weight overvaluation; and body dissatisfaction. The scales are similar to those of the original EDE-Q (Fairburn & Beglin, 1994) and its three-factor structure has been confirmed in a number of studies (see Table 1 and Grilo et al., 2015). Item selection was based on the interview version of the EDE-Q, the EDE (e.g., Grilo et al., 2010), and this brief measure (referred to here as the EDE-Q7) has performed well in replication studies evaluating different short forms of both the EDE (Burke et al., 2017) and EDE-Q (e.g., Calugi et al., 2017;Rand-Giovannetti et al., 2020;Serier et al., 2018).

Studies evaluating the psychometric properties of the EDE-Q7
have provided participants with all items of the full EDE-Q, which could influence participant responses and, potentially, the psychometric properties of the briefer, nested, measure (e.g., Jenkinson & Fitzpatrick, 2007). Such administration, while informative, does not exploit many of the advantages of a brief version, such as reduced participant burden. The EDE-Q7, delivered in its non-nested form, could reduce administration time and provide a measure with sound psychometric properties for use across clinical and community samples (Grilo et al., 2013(Grilo et al., , 2015. Importantly, despite differences between men and women in their body ideals and body image concerns (Carey et al., 2019;Smith et al., 2017), demonstration of measurement invariance for the EDE-Q7 could lead to a brief measure of ED symptoms which is comparable between genders.
Few studies have investigated the psychometric properties of the EDE-Q7 in male samples (Table 1; see also Rand-Giovannetti et al., 2020), reflecting an underrepresentation of males in ED research more generally (see Murray et al., 2017). Thus, an exploration of whether a brief version of the EDE-Q is suitable for both male and female populations has been recommended (Tobin et al., 2019), particularly given some findings that shorter forms of the EDE-Q may be more appropriate for gender comparison (Rand-Giovannetti et al., 2020), albeit by sacrificing some detail and potential appropriateness for men with EDs (see Smith et al., 2017). As part of an evaluation of the EDE-Q7, the current study will seek to replicate associations between its scales and variables relevant to the development and maintenance of eating pathology. For example, Burnette, Simpson, and Mazzeo (2018a) found that body mass index (BMI) was associated with eating psychopathology (e.g., weight and shape concerns, dietary restraint) in undergraduate women but not undergraduate men. Studies of weight suppression (WS; one's highest ever weight minus current weight; e.g., see Lowe, Thomas, Safer, & Butryn, 2007) have more consistently shown relationships with constructs such as body dissatisfaction, dietary restraint, and overall eating psychopathology across both genders (e.g., Burnette & Mazzeo, 2020;Burnette et al., 2018a;Burnette, Simpson, & Mazzeo, 2018b). Similarly, although investigation into the impact of age is limited (Carey et al., 2019;Rø, Reas, & Rosenvinge, 2012), findings suggest that there are no associations between age and brief EDE-Q scores (Kliem et al., 2016;McLean, Paxton, & Wertheim, 2010).
A principal aim of the current study is to evaluate the factor structure of a non-nested form of the EDE-Q7 (e.g., Grilo et al., 2015) in both men and women. It will explore measurement invariance across genders and, given previous work using the items as part of the longer EDE-Q, use confirmatory factor analysis to investigate the EDE-Q7 when presented as a standalone instrument. The study also sets out to evaluate internal consistency and intercorrelations between the EDE-Q7 scales, in addition to providing comparison with related measures, such as health-related quality of life, and presenting data on central tendency and sample distribution (e.g., skewness) from a UK student sample. We hypothesized that the EDE-Q7 would show adequate fit for both men and women, show good internal consistency, and demonstrate correlations with related measures in line with previous research (e.g., Grilo et al., 2015;Tobin et al., 2019), with more consistent correlations with BMI and WS in women.

| METHOD
The EDE-Q7 was evaluated in two independent samples: one maleonly sample (Sample 1) and one mixed-gender sample (Sample 2). The two samples were taken from separate studies which differed slightly in their aims and research questions. Additional measures were completed by the mixed-gender sample. Male participants from Sample 1 and Sample 2 were grouped together for the purposes of the current study, and some analyses are presented for males and females separately.

| Participants
Participants (N = 405) were recruited from a large UK university through advertising via internal university participation schemes and social media. The study methods were approved by the School's Ethics Committee and were performed in accordance with ethical standards as laid down in the 1964 Declaration of Helsinki and its later amendments.
Sample 1 comprised a study on male body image and recruited 119 male students. Average age was 21.18 years (SD = 1.75, range = 17-25). Sample 2 were recruited as part of a study on mental health in students and comprised 286 undergraduate and postgraduate Psychology students (244 female, 36 male, one did not identify with either gender, missing n = 5). Average age was 20.51 years (SD = 4.19, range = 18-51; n = 285) and the majority (205; 71.7%) identified themselves as being from a White ethnic background.

| Measures
Across both samples, participants completed a battery of questionnaires via an online survey. The EDE-Q7 comprises seven items from the EDE-Q (Fairburn & Beglin, 2008), a widely used questionnaire assessing the frequency of ED symptoms over the past 4 weeks. The EDE-Q provides 22 attitudinal items which are rated on a 0-6 Likert scale, based on either frequency (e.g., "No days," "1-5 days," "Every day") or degree ("Not at all" to "Extremely"). An example item is "Has your weight influenced how you think about (judge) yourself as a person?" Seven items are used in the EDE-Q7 (see Grilo et al., 2015, Table 4) and the scoring structure is identical, generating three scales (Dietary Restraint, Shape/Weight Overvaluation, Body Dissatisfaction).
Both samples also completed six behavioral items from the longer EDE-Q assessing the frequency of objective overeating, objective binge eating (an item regarding the number of days involving binge eating is not included in the analyses), self-induced vomiting, laxative use, and "driven" exercise (Fairburn & Beglin, 2008). For the purposes of this study, frequencies of self-induced vomiting and laxative use were combined (e.g., Gideon et al., 2016) and objective overeating was not included as this is not a disordered eating behavior currently considered in diagnostic manuals.
Sample 2 also completed the PHQ-2 (Kroenke, Spitzer, & Williams, 2003) to assess symptoms of depression by asking participants to report the frequency of two items ("little interest or pleasure in doing things" and "feeling, down, depressed, or hopeless") over the past 2 weeks. The responses are "not at all," "several days," "more than half the days," and "nearly every day." Scores range from 0 to 3, giving a total score from 0 to 6. The measure has been shown to be a valid and practical tool for assessing the severity of depression (Kroenke et al., 2003;Löwe, Kroenke, & Gräfe, 2005). The Spearman-Brown coefficient (used for estimating the reliability of two-item scales; Eisinga, te Grotenhuis, & Pelzer, 2013) was 0.821.
Sample 2 also completed the Eating Disorders Quality of Life Questionnaire (EDQOL; Engel et al., 2006) as a measure of ED-specific quality of life impairment. Each of the 25 items is coded on a five-point scale (0-4) from "never" to "always" with higher scores indicating greater impairment. Four domains can be assessed in addition to a Total score (calculated as the average of all items). The Total score of the EDQOL (McDonald's ω in the current study was 0.95) was used to look at associations with health-related quality of life.
All participants were asked to provide their weight, height, and the highest adult weight. This information was used to calculate BMI (weight/[height 2 ]) and WS, variables which were used to explore associations with EDE-Q7 scales.

| Normality tests
Inspection of the data suggested deviation from normality and the Shapiro-Wilk test was significant for all EDE-Q7 scales. Neither log nor square-root transformations sufficiently improved the scale distributions, so non-parametric tests are reported where possible.

| Missing data
Two individuals (0.49%) did not complete the EDE-Q7 (both omitted Questions 4-7 and were women in Sample 2). Little's Missing Completely at Random (MCAR) test indicated that missing data were consistent with being missing at random; χ 2 (3) = 3.278, p = .351. Therefore, these two cases were excluded from the confirmatory factor analysis (CFA), in line with suggestions that such a small amount (under 1%) is likely to have little impact on fit indices (Köse, 2014) and that the data appeared to be missing at random (Jackson, Gillaspy, & Purc-Stephenson, 2009).

| Statistical analyses
CFA using Amos (Version 25) was used to evaluate the fit of the suggested three-factor model and SPSS (v25) or JASP (JASP, 2020) were used for other descriptive and inferential statistics. Two CFAs were conducted: one for men (n = 119 from Sample 1 and n = 36 from Sample 2) and one for women (N = 244, from Sample 2). Due to the presence of non-normality, ordinal data, and relative simplicity of the model, a robust full information maximum likelihood (FIML) estimation was used to evaluate model fit. Specifically, issues with non-normality were addressed by adjusting fit indices using the Bollen-Stine (Bollen & Stine, 1992) procedure for nonnormal data in structural equation modeling (Walker & Smith, 2017). Assessment of fit was determined using guidelines (e.g., see Hu & Bentler, 1999): in addition to χ 2 and df values, the TLI (desirable ≥0.95), CFI (≥0.90, ideally ≥0.95), RMSEA (<0.10 but around 0.06 desired), and SRMR (≤0.08) are reported.
To assess measurement invariance by gender, a stepwise strategy was employed using multi-group CFA (MGCFA; see Milfont & Fischer, 2010;Vandenberg & Lance, 2000). Specifically, data for the female and male samples were fitted to four different invariance models: configural; metric; scalar; and error (strict). Configural invariance tests whether the model structure is invariant across groups, with metric invariance (compared with the configural model) testing the equivalence of factor loadings. Scalar invariance tests whether the intercepts are the same and error invariance compares measurement error (error variance) across groups. In line with literature (Chen, 2007;Cheung & Rensvold, 2002), changes in CFI <0.01, RMSEA <0.015, and SRMR <0.030 were taken to indicate measurement invariance across genders. If invariance is achieved to at least the scalar level, mean-level group differences in the latent variables (scales) can be compared.
Scale overlap was assessed using non-parametric (Spearman's ρ) correlations among the EDE-Q7 scales. Associations of the EDE-Q7 scales were explored through correlations with age, BMI, and WS from the full sample and with PHQ-2 and EDQOL Total scores from Sample 2. McDonald's ω was used to assess internal consistency reliability due to the risk of violated assumptions in Cronbach's α (Dunn, Baguley, & Brunsden, 2014). Given the aims of the study, a sufficient sample size was needed for both CFA of the EDE-Q7 and to detect significant correlations of a small-medium size (e.g., Grilo et al., 2015).

| Confirmatory factor analysis
For the male sample, the three-factor model fit the data well according to the CFI and SRMR (see Table 2). The TLI value was acceptable and the RMSEA marginally above the suggested cutoff.
The χ 2 :df ratio was acceptable and the χ 2 test was significant (p < .05).
For the female sample, fit statistics were supportive of the model according to the TLI, CFI, SRMR, and RMSEA ( Table 2). The χ 2 :df ratio was acceptable and, as in the male sample, the χ 2 test was significant (p < .05).

| MGCFA and measurement invariance
Tests for gender measurement invariance showed that the metric invariance model did not result in worse model fit compared to the configural model (see Table 3). Progressive tests of scalar and error invariance also suggested equivalence. Therefore, we conducted a comparison of mean differences between genders (see Table S3). At the item level, there were some differences between men and women on Shape/Weight Overvaluation and Body Dissatisfaction with small to medium effect sizes. The mean scores for the Dietary Restraint scale (and two of three constituent items) did not differ. Table 4 presents means, standard deviations, skewness, kurtosis, and corrected item-total correlations for the EDE-Q7 items. Item characteristics and mean-level comparisons for men and women are provided in the Supplementary Material (Tables S1-S3).

Corrected item-total correlation
Have you been consciously trying to limit the amount of food you eat to influence your shape or weight?

| Internal consistency and correlations
Internal consistencies, descriptive statistics, and intercorrelations of the EDE-Q7 scales are presented in Table 5. For both genders, McDonald's ω suggested good internal consistency for all EDE-Q7 scales (see Table 5). Correlations between scales ranged from 0.508 to 0.761 (moderate to strong effect sizes) and were similar in magnitude for both the female and male samples.
Scales of the EDE-Q7 were significantly correlated with disordered eating behaviors and WS. For the mixed-gender sample, the EDE-Q7 scales were also significantly correlated with EDQOL Total (R 2 varied between 0.15 [Dietary Restraint] and 0.37 [Body Dissatisfaction]) and PHQ-2 (see Table 6), indicating moderate to strong associations. As expected, EDE-Q7 symptoms were not significantly correlated with age in the current sample. Correlations with BMI were variable, with Dietary Restraint evidencing a small effect size, and no significant associations with Shape/Weight Overvaluation and Body Dissatisfaction. Associations between EDE-Q7 scales and BMI and WS were significant in the female sample but BMI was not significantly correlated with any scale in the male sample (see Tables S4   and S5).

| DISCUSSION
The findings reported here suggest that the factor structure of a brief, seven-item version of the popular EDE-Q (see Fairburn & Beglin, 1994 ;Machado et al., 2020) is acceptable whether the questionnaire is presented nested within the larger scale or, as shown here, as a non-nested, standalone version. Confirmatory factor analysis demonstrated acceptable fit across genders using several indices and although the χ 2 fit statistic was significant (indicating poor model fit), this may be related to the model's relative simplicity or sample size dependency (but see also Ropovik, 2015). Similarly, the RMSEA estimate was acceptable although only approaching the desirable range.
Such a result may falsely indicate poor fit given the number of degrees of freedom, issues of non-normality, and associated power (see Kenny, Kaniskan, & McCoach, 2015) and should be considered alongside other indicators suggestive of adequate fit.
Results of invariance tests suggest that the three scales of the EDE-Q7 can be meaningfully compared across men and women and that, despite discrepancies in their body image ideals , the factor structure of the EDE-Q7 is suitable for both groups, at least in college samples (see also Grilo et al., 2015). Given that tests of invariance in other short forms of the EDE-Q have supported use across genders (e.g., Kliem et al., 2016), the EDE-Q7 may represent an appropriate measure for use in men and women, given that the original factor structure of the full EDE-Q has received limited empirical support in male samples (Carey et al., 2019;Rand-Giovannetti et al., 2020; but see Tobin et al., 2019). A possible explanation is that certain items of the 28-item EDE-Q may be responsible for scalar invariance across sexes (Rand-Giovannetti et al., 2020), items which are not included in the shorter form suggested by Grilo et al. (2013Grilo et al. ( , 2015. Although male data suggested greater positive skew, the findings do not suggest a floor effect for many EDE-Q7 items, particularly those of the dietary restraint scale (means for which did not differ between genders; Table S3). Although a proportion of the male sample reported scores of 0 for certain EDE-Q7 items, the highest of these was 40% (Item 1) and some proportions were comparable between genders (particularly Items 2 and 3).
One benefit of the original EDE-Q is that it can provide a Global score (a mean of all 28 items), reflecting a "common core of pathological attitudinal features of eating problems" (Friborg, Reas, Rosenvinge, & Rø, 2013, p. 196). Due to the constraints on hierarchical modeling approaches with scales of fewer than three items (e.g., Reise, 2012), it was not possible to test a hierarchical or bifactor model of the three-factor EDE-Q7, and so use of the Global score obtained from the EDE-Q7 cannot currently be supported.
Estimates of internal consistency and correlations with related constructs indicate that the EDE-Q7 performs well as a standalone measure and when provided "nested" as part of the longer EDE-Q (e.g., Grilo et al., 2015). Correlations observed in the data (including correlations between the EDE-Q7 scales) were similar in magnitude to those reported by Grilo et al. (2015) and Tobin et al. (2019). Similarly, scales of the EDE-Q7 were correlated with WS and BMI in the female sample but BMI was not significantly correlated with any scale in the male sample. These findings suggest that the EDE-Q7 in a non-nested form shows similar associations to the longer EDE-Q and performs T A B L E 6 Correlations (ρ) between EDE-Q7 scales, age, and self-report measures of weight suppression, BMI, depression, disordered eating behaviors, and eating disorder-specific quality of life across samples well with respect to showing divergent patterns of associations across variables in college samples (see also Grilo et al., 2015).
As in previous studies, there was a strong correlation between the EDE-Q7 scales of body dissatisfaction and shape/weight overvaluation, which may indicate collinearity, but seems to suggest that the modified EDE-Q7 scales "reflect less overlap and redundancy" than the original EDE-Q structure (Grilo et al., 2015, p. 286; see also Tobin et al., 2019). There were also significant correlations with disordered eating behaviors (binge eating, purging, and "driven" exercise), although more so in the female sample and the current study presents correlations with depression and eating disorder-specific quality of life, albeit in only the mixed-gender sample. Quality of life has not yet been investigated with the EDE-Q7, although a recent study (Machado et al., 2020) looked at associations with impairment secondary to an ED and another study explored the association between the body dissatisfaction scale and generic quality of life (Purton et al., 2019), both noting similar findings.
Supporting the administration of a brief form, the proportion of missing data was small although the study had several important limitations. First, the samples comprised a well-educated, non-clinical group, which limits the generalizability of these results to the wider population. Second, the male-only subsample was collected as part of a study with a specific focus on body image which raises concerns of self-selection bias in this sample. Third, by using a non-nested approach, it was not possible to compare alternative short-forms of the EDE-Q (e.g., Kliem et al., 2016;Machado et al., 2020), although there are shortcomings to comparing fit indices across different, nested, models (Vandenberg & Lance, 2000). Fourth, data were nonnormally distributed and two cases were excluded from the CFA due to missing data, so the CFA findings should be interpreted with caution and there is a possibility of an inflated Type 1 error (e.g., Jackson et al., 2009;Muthén & Kaplan, 1985). Nonetheless, the findings are in line with existing literature and data analysis took account of nonnormal data profiles.
As noted by Machado et al. (2020), further research should seek to evaluate the EDE-Q7 as a non-nested version and with more diverse samples; many of those so far have been conducted with predominantly female, well-educated samples. Although non-normality is likely to persist, larger sample sizes would afford greater confidence should the findings presented here be replicated and permit further exploration of measurement invariance (e.g., see Rand-Giovannetti et al., 2020). Future studies should consider additional tests of the psychometric properties of the EDE-Q7, including test-retest reliability and convergent validity (e.g., through use of clinical samples or independent measures of eating pathology). Inclusion of clinical samples would be useful in exploring ED behaviors and eating disorderspecific quality of life in relation to the non-nested version of the EDE-Q7, as well as providing greater information about its sensitivity and specificity. Further research into measures of ED symptoms might also consider refinement based on patient input (Rolstad, Adler, & Rydén, 2011) in addition to psychometric evaluation.
In summary, there now exists a corpus of empirical work conducted in different samples and nations and using different methodological approaches which supports the factor structure of a brief, seven-item version of the EDE-Q. The original measure (Fairburn & Beglin, 2008) was developed based on a conceptual understanding of the items (e.g., see Cooper, Cooper, & Fairburn, 1989) as opposed to statistically-driven decisions. The modified seven-item scale (Grilo et al., 2013) may strike a balance between coverage of relevant psychopathology and sound psychometric properties and be of use to both clinicians and researchers looking for a brief assessment of eating psychopathology.