Self ‐ reported emotion regulation difficulties in psychosis: Psychometric properties of the Difficulties in Emotion Regulation Scale (DERS ‐ 16)

Objective: Individuals with psychosis self ‐ report difficulties in understanding, relating, and responding to emotions as treatment priorities, yet we lack comprehensive, reliable, and valid assessments for routine clinical use. Methods: The psychometric properties of a brief version of the Difficulties in Emotion Regulation Scale ‐ 16 (DERS ‐ 16) were examined using anonymized data from a sample of 150 outpatients with psychosis. Results: Confirmatory factor analysis supported the five ‐ factor structure of the DERS ‐ 16. The model fit was further improved by omitting two items. Measurement invariance was shown with respect to age and gender. The DERS ‐ 16 demonstrated good internal consistency, well comparable to the original DERS. Evidence toward convergent validity is also presented. Conclusion: Findings suggest that the DERS ‐ 16 is a reliable and valid measure of self ‐ reported emotion regulation difficulties in individuals with psychosis. Further research on the clinical utility DERS (16 ‐ and 36 ‐ item versions), construct validity, and basic demographic (age, gender, self ‐ reported ethnicity using six standard trust categories) data was obtained between September 2015 and June 2019 from three sources: ongoing service evaluation of routine clinical practice in which participants completed self ‐ report measures in initial assessments for individual and group psychological therapies (Local approval: PPF ‐ PSYCHLO ‐ 14 ‐ 55); a local feasibility study of a group intervention (Research Ethics Committee Reference: 17/LO/04/45); and a cross ‐ sectional study of ER and trauma (Research Ethics Committee Reference: 16/LO/0869). All participants in these studies completing a DERS during these dates contributed data to the current study, no participant with DERS data was excluded. Permission to use the data for this purpose was sought from the respective project leads, who anonymized the data. No other data sources were approached. Ethical committee guidance was that ethical approval was not required for the use of anonymized data.

, increased psychological distress and reduced emotional well-being (Moran et al., 2018;Perry et al., 2011;van der Meer et al., 2009) and poorer social functioning (e.g., Kimhy et al., 2012Kimhy et al., , 2014 and have been found to mediate and moderate the association between ER strategy use and psychotic symptoms (Kimhy et al., 2020;Lui, Subramaniam, et al., 2020). Furthermore, individuals' personal accounts of their experiences of psychosis emphasize that support to deal with negative emotions and feeling less overwhelmed by emotions are important and sometimes neglected foci of psychological therapy (Greenwood et al., 2010;Griffiths et al., 2019;Holding et al., 2016;Hutchins et al., 2016;Lawlor et al., 2017). Taken together these findings suggest that further research and clinical attention is warranted however, while there are adequate measures of ER strategy use in psychosis (Ludwig et al., 2019), assessing and clinically targeting ER difficulties has been hindered by the lack of comprehensive and psychometrically sound measures for use with individuals with psychosis (Lawlor et al., 2020).
The Difficulties in Emotion Regulation Scale (DERS; Gratz & Roemer, 2004) is a well-established and comprehensive self-report measure of ER difficulties. It consists of six factor-analytically derived subscales: nonacceptance of negative emotions (Nonacceptance), difficulties in engaging in goal-directed behaviors when distressed (Goals), difficulties controlling impulsive behaviors when distressed (Impulse), limited access to ER strategies perceived as effective (Strategies), lack of emotional awareness (Awareness), and lack of emotional clarity (Clarity). The DERS has been found to demonstrate high levels of internal consistency and good test-retest reliability in a range of clinical and nonclinical samples (e.g., Fowler et al., 2016;Gratz & Tull, 2010;Hallion et al., 2018) including with individuals with psychosis (Bonfils & Lysaker, 2020;Owens et al., 2013). In support of its construct validity, DERS total and subscale scores are associated with behavioral, physiological, and neurological measures of ER, with multiple forms of psychopathology (e.g., anxiety, depression, and bipolar disorder) and clinically relevant constructs (e.g., experiential avoidance, emotional expressiveness, and mindfulness) and behaviors (e.g., self-harm and substance use; Gratz & Roemer, 2004;Gratz & Tull, 2010;Gratz et al., 2006;Van Rheenen et al., 2015;Vasilev et al., 2009). In support of its validity for use with individuals with psychosis specifically, one study has found that the DERS total score was positively associated with dysfunctional coping and negatively associated with more adaptive (emotion-and problem-focused) coping (Owens et al., 2013). Furthermore, there is increasing evidence that DERS total and subscale scores change responsively during clinical interventions with inpatients and outpatients with a range of emotional disorders (e.g., Ben-Porath et al., 2014;Fowler et al., 2016;Fox et al., 2007;Gratz et al., 2014;Hallion et al., 2018;Sahlin et al., 2017;Wonderlich et al., 2014). The DERS may thus represent a useful outcome measure for the treatment of ER difficulties.
Although the DERS is widely used in clinical and nonclinical settings, concerns have been raised about its underlying factor structure and confirmatory factor analytic studies have often required post hoc modifications to establish an adequate fit (e.g., Kökönyei et al., 2014;Neumann et al., 2010;Perez et al., 2012). It has also been suggested that the awareness factor should be excluded from the measure as it demonstrates lower reliability and validity and a number of studies have found that a five-factor model from which it is excluded provides a better or equivalent fit (e.g., Bardeen et al., 2012;Fowler et al., 2014;Hallion et al., 2018;McDermott et al., 2009;Miguel et al., 2017;Osborne et al., 2017;Tull et al., 2007). In addition, it has been noted that many of the DERS items are conceptually similar which could be perceived as repetitive to participants (e.g., Kaufman et al., 2016). Since concise assessment measures are likely to be more broadly acceptable to service users and assessors (e.g., Edwards et al., 2002;Fornells-Ambrojo et al., 2017;Rolstad et al., 2011) and more efficient in both research studies and busy clinical settings, recent research has begun to investigate the psychometric properties of briefer versions of the DERS (e.g., Bjureberg et al., 2016;Kaufman et al., 2016;Victor & Klonsky, 2016).
The DERS-16 was developed by Bjureberg et al. (2016), based on an analysis of item-total correlations of the original DERS and considerations regarding content validity. All items from the awareness subscale were excluded, resulting in a five-factor structure which has been replicated by six subsequent studies (Charak et al., 2019;Hallion et al., 2018;Miguel et al., 2017;Shahabi et al., 2018;Westerlund & Santtila, 2018;Yiĝit & Yiĝit, 2017). However, one study found that post hoc modification improved the fit (dropping two items due to cross-loadings; Westerlund & Santtila, 2018). The DERS-16 is highly correlated with the original DERS (Bjureberg et al., 2016) and has been LAWLOR ET AL.
| 3 found to have comparable psychometric properties, including good internal consistency, test-retest reliability, and convergent and discriminant validity (Bjureberg et al., 2016;Charak et al., 2019;Hafner et al., 2019;Hallion et al., 2018;Miguel et al., 2017;Shahabi et al., 2018;Skutch et al., 2019;Yiĝit & Yiĝit, 2017). Since its development, the measure has been used with a range of populations and types of psychopathology including adults with emotional disorders (Hallion et al., 2018), substance use disorders , suicidal ideation and alcohol misuse (Wilks et al., 2018), and adolescents with non-suicidal self-injury  and complex substance use and mental health issues (Sloan et al., 2017). It has also been found to be responsive to the treatment of ER difficulties in clinical samples (e.g., Harvey et al., 2019;Sahlin et al., 2019). Taken together, these findings suggest the DERS-16 is a promising measure of ER difficulties.
The DERS-16 is yet to be validated in a clinical population of individuals with psychosis. Responding to the clinical need for a brief measure of specific ER difficulties that can inform personalized, targeted interventions, we sought to examine the psychometric properties of the DERS-16 in a sample of outpatients with psychosis. Specifically, the internal consistency, construct validity and latent factor structure of the DERS-16 were investigated.
We applied confirmatory factor analysis (CFA) and used self-report measures of general psychological distress, experiential avoidance, and ER skills to test the convergent validity of the DERS-16.

| Data collection and participants
Data from 150 outpatients using psychosis services in South London and Maudsley National Health Service Foundation Trust were included. Services were for working-age adults (aged 18 years and over) with established psychosis (i.e., not early intervention) including schizophrenia spectrum diagnoses and psychotic symptoms as the main presenting problem in the context of affective or other complex mental health conditions. Anonymised DERS (16-and 36-item versions), construct validity, and basic demographic (age, gender, self-reported ethnicity using six standard trust categories) data was obtained between September 2015 and June 2019 from three sources: ongoing service evaluation of routine clinical practice in which participants completed self-report measures in initial assessments for individual and group psychological therapies (Local approval: PPF-PSYCHLO-14-55); a local feasibility study of a group intervention (Research Ethics Committee Reference: 17/LO/04/45); and a cross-sectional study of ER and trauma (Research Ethics Committee Reference: 16/LO/0869). All participants in these studies completing a DERS during these dates contributed data to the current study, no participant with DERS data was excluded. Permission to use the data for this purpose was sought from the respective project leads, who anonymized the data. No other data sources were approached. Ethical committee guidance was that ethical approval was not required for the use of anonymized data.

| MEASURES
3.1 | Difficulties in Emotion Regulation Scale (Gratz & Roemer, 2004) The DERS is a 36-item self-report measure of six facets of ER. Items are rated on a 5-point ordinal scale from 1 ("almost never [0%-10%]") to 5 ("almost always [91%-100%]"). The scale yields a total score and 6 subscale scores, with higher scores indicating greater ER difficulties. The psychometric properties of the DERS are described throughout the manuscript. The DERS was completed by participants in the cross-sectional study. It was also initially completed by participants in the service evaluation (n = 15), however, following feedback that it was lengthy to complete (particularly in the context of completing multiple self-report measures in a single assessment session), the DERS-16 was given to the remainder of participants.
3.2 | Difficulties in Emotion Regulation Scale-16 (Bjureberg et al., 2016) The DERS-16 is a brief form of the 36-item DERS (Gratz & Roemer, 2004. It consists of 16 items which are rated on a 5-point ordinal scale from 1 ("almost never") to 5 ("almost always"), indicating how often each statement applies to the respondent. Higher scores reflect greater ER difficulties. The psychometric properties of the DERS-16 are described throughout the manuscript. The DERS-16 was completed by participants in the service evaluation and group feasibility study.
3.3 | Dialectical Behavioral Therapy Ways of Coping Checklist -Dialectical Behavioral Therapy Skills Subscale (Neacsiu, Rizvi, Vitaliano, et al., 2010) The DBT-WCCL consists of two subscales: the DBT Skills Subscale (DSS) and Dysfunctional Coping Scale. The measure has demonstrated adequate to excellent reliability and validity including sensitivity to treatment effects Neacsiu, Rizvi, Vitaliano, et al., 2010). The DSS was completed by participants in the service evaluation and served as a measure of convergent validity for the present study. The DSS comprises 38 items assessing the frequency of DBT skill use over the past month. It is rated on a 4-point ordinal scale from 0 ("never used") to 3 ("regularly used") and higher scores indicate greater DBT skill use. The DSS subscale has demonstrated adequate psychometric properties in a transdiagnostic sample (Stein et al., 2016) but to our awareness has not previously been used specifically in a psychosis sample.

| Clinical Outcomes in Routine Evaluation-10 (Barkham et al., 2013)
The CORE-10 assesses psychological distress over the last week and covers well-being, symptoms, functioning, and risk. It was completed by participants in the service evaluation and served as a measure of convergent validity in the current study. The CORE-10 generates a mean distress score based on 10 items, each rated on an ordinal scale from 0 ("not at all") to 4 ("most or all the time"). A mean score of 1.1 or above indicates clinically significant distress.
The CORE-10 has demonstrated high internal consistency in clinical samples (Barkham et al., 2013) and has been used in samples with psychosis (e.g., Fornells-Ambrojo et al., 2017).
3.5 | Acceptance and Action Questionnaire (Bond et al., 2011) The AAQ-II assesses experiential avoidance/psychological flexibility, and was used in the group feasibility study data set, providing data on convergent validity for the current study. The AAQ-II is scored on an ordinal scale from 1 ("never true) to 7 ("always true"). Lower scores indicate greater difficulties in accepting mental experiences and persisting with life goals in their presence. The AAQ-II has demonstrated good internal reliability (Bond et al., 2011) including in samples of individuals with psychosis (e.g., Morris et al., 2014;Varese et al., 2016).

| PROCEDURE
Anonymized data from each study were collated for analysis. For each of the studies, participants had completed the self-report measures in the presence of a qualified or trainee clinical psychologist, with support if needed, and given consent for their responses to be used for evaluation/research, depending on the source study.

| Data analysis
4.1.1 | Factor analysis CFA for categorical data using the weighted least squares estimator (WLSMV; Muthén et al., 1997) was used to investigate the factor structure of the DERS-16. To evaluate the model we used both measures of absolute and relative fit, namely: the relative chi-square (χ 2 /df: values close to 2 indicate close fit; Hoelter, 1983), the root mean square error of approximation (RMSEA; values less than 0.8 are required for adequate fit; Browne & Cudeck, 1993), the Tucker-Lewis Index (TLI, values higher than 0.9 are required for close fit; Bentler & Bonett, 1980) and the Comparative Fit Index (CFI, values higher than 0.9 are required for close fit; Bentler, 1990).
The scalar invariance of the items in relation to gender and age was studied using the multiple indicatorsmultiple causes model (MIMIC; Muthén, 1989). The Mplus software (Muthén & Muthén, 1998

| Reliability, validity, and hypothesis testing
With respect to the reliability, internal consistency was evaluated via Cronbach's (1951) alpha coefficient, along with the inter-item correlations (IIC). Parametric tests (Pearson correlation coefficients and independent samples t-tests) were used in validity and hypothesis testing. Stata software (StataCorp, 2017) was used for this part of the analysis.
Numbers for each analysis vary due to missing data and variation in administered measures in routine service Descriptive statistics for the measures used are shown in Table 1. Participants reported some level of ER difficulty (as indicated by a non-zero average score) across subscales, particularly in relation to the pursuit of goals and experiencing emotions as "overwhelming" when distressed. These items were endorsed as being experienced "about half (36%-66%) of the time," with all other items being endorsed as areas of difficulty "sometimes" (11%-35% of the time).

| Confirmatory factor analysis
CFA was conducted on the DERS-16 items, administered with (n = 64) or without (n = 61) the rest of the 20 DERS-36 items (total n = 125; see Table 2   The loading of the first item per factor is constrained to be 1, for model identification. CFI = 0.88). The fit was much improved when Bjureberg et al's (2016)

| Measurement invariance
Measurement invariance with respect to age and gender was investigated using the MIMIC model (one adjusted for the other). Gender was not found to affect the measurement of any items, whereas age influenced negatively one item (I14: estimate = −0.021, p = 0.041) and positively another one (S28: estimate = 0.021, p = 0.035). That is, measurement non-invariance was found in only two items, only with respect to age, and the size of the effects was negligible (0.021 difference in the item value per year of age). We therefore conclude that the scale measurement is not affected by age or gender. Since measurement invariance has been established, structural invariance (difference in the scores), is also tested in the following section.

| Scores, reliability, and validity of the DERS-16
The reliability indices for the DERS-16 are presented in Table 1 No significant differences in the scores occurred between males and females, in any of the subscales (p > 0.05 in all cases). The inter-correlations of the five factors were moderate to high and no significant correlations were found with age (Table 3). Table 3 presents the correlations of the DERS-16 total and subscale scores with the criterion measures.
General psychological distress as measured by the CORE-10 was positively correlated with all the DERS-16 subscales and with the total score. The correlations were in the expected direction and moderate to high (0.40-0.67). Moderate positive correlations were also found between the AAQ-II and the DERS-16 total and three of the five subscales (impulse, strategies, and nonacceptance). Weak negative correlations were found between the DSS and the total score and impulse and strategies subscales of the DERS-16. All correlations were in the expected direction and taken together provide evidence toward the convergent validity of the scale.

| DISCUSSION
Given the evidence that individuals with psychosis report difficulties with both ER strategies and ER abilities and the focus of research to date on ER strategies, there is a need for reliable and valid assessment measures of difficulties in ER abilities that can be used in routine clinical practice. We therefore investigated the psychometric properties of the DERS-16 in a sample of outpatients with psychosis. Specifically, the DERS-16 was found to LAWLOR ET AL.
demonstrate good internal consistency and its convergent validity was supported, indicating that the measure may be a reliable and valid measure of ER difficulties. The original five-factor model was also verified. Although the original 36-item DERS has been used in several studies with individuals with psychosis, to our knowledge this is the first study examining the psychometric properties of the DERS-16 in this population.
In terms of reliability, consistent with the findings of previous evaluations of the DERS-16 (Bjureberg et al., 2016;Hallion et al., 2018;Miguel et al., 2017;Shahabi et al., 2018;Westerlund & Santtila, 2018;Yiĝit & Yiĝit, 2017), internal consistency analyses indicated good reliability for the total score and all subscales. Inter-item correlations were also moderate indicating that there are no redundant items and all items are sufficiently related to the trait. Furthermore, the DERS-16 was found to have internal consistency comparable to the original 36-item DERS, as were the 16 items of the DERS-16 when isolated from the original DERS. The current findings are also consistent with previous reports (e.g., Fowler et al., 2014) that the awareness subscale of the original 36-item DERS was less reliable than other subscales.
The validity of the DERS-16 was indicated by strong positive correlations with psychological distress, which was found for all subscales as well as the total score. This is in accordance with the findings of other studies (Bjureberg et al., 2016;Shahabi et al., 2018;Yiĝit & Yiĝit, 2017). Moderate positive correlations were also found between the DERS-16 total and impulse, strategies, and nonacceptance subscales and the AAQ-II. The AAQ-II is a measure of experiential avoidance, which refers to attempts to "alter the frequency or form of unwanted private events, including thoughts, memories, and bodily sensations, even when doing so causes personal harm" (Hayes et al., 2012, p. 981). The associations with the nonacceptance subscale are thus to be expected. The correlations with the impulse and strategies subscales are consistent with evidence that difficulties in accepting emotions may motivate efforts to suppress or avoid emotions, potentially intensifying the emotions experienced (leading to the perception that they are overwhelming or out of control) and reducing the extent to which individuals are able to flexibly and adaptively respond to them (Ford & Gross, 2018;Hayes et al., 2006;Salters-Pedneault et al., 2010;Webb et al., 2012). There were also some significant but weak correlations between the DERS-16 total score and two of its subscales (impulse and clarity) and the DSS, suggesting that individuals who self-report greater difficulties in modulating their emotions and impulse control when distressed also report applying fewer adaptive ER strategies as measured by the DSS. This pattern of correlations is consistent with the DSS constituting a measure of DBT skills for adaptively responding to emotions, particularly crisis survival skills (i.e., ways of managing impulses to act on emotions).
Scores on the DERS-16 were not found to differ according to participant age or gender. Previous studies have varied in whether they have found no gender differences (Westerlund & Santtila, 2018), small differences (Miguel et al., 2017) or differences for some but not all subscales (Hallion et al., 2018;Yiĝit & Yiĝit, 2017). Age differences have been reported by all previous psychometric evaluations of the DERS-16 (Hallion et al., 2018;Miguel et al., 2017;Westerlund & Santtila, 2018), consistent with evidence that ER difficulties reduce with age .
Results of the CFA confirmed that the original five-factor structure for the DERS-16 could be replicated, in line with previous findings (Charak et al., 2019;Hallion et al., 2018;Miguel et al., 2017;Shahabi et al., 2018;Westerlund & Santtila, 2018;Yiĝit & Yiĝit, 2017), however, two items were dropped due to cross-loadings, replicating the findings of Westerlund and Santtila (2018). Among the five factors, clarity is a two-item factor, but factors measured by three or more items tend to be more reliable and valid (see also Raubenheimer, 2004). In the present analyses, the two-item factor surprisingly suffices, both in terms of model fit and in terms of reliability.
Technically, the reason for the adequacy of the small factor is the strong item intercorrelations and inter-factor correlations. Our study focused on evaluating existing structures rather than refining the model, as our sample size does not allow for both exploratory and confirmatory models to be fitted. In future research, it is advisable to explore if clarity could either be augmented with a third item or omitted.

| Clinical implications
The present findings provide preliminary evidence that the DERS-16 is a reliable and valid measure that may have clinical utility as an alternative to the original 36-item DERS. Since self-reported ER difficulties are associated with behavioral indices of emotional dysregulation (e.g., Gratz et al., 2014), they may thus reflect actual or perceived difficulties, both of which could be helpfully targeted in psychological therapy. There is growing evidence for the effectiveness of third-wave cognitive behavioral interventions including acceptance, mindfulness, and metacognition with clients with psychosis (Jansen et al., 2019;Johns et al., 2016;Khoury et al., 2013;Louise et al., 2018;Lysaker et al., 2018), however, the acceptability and impact of specifically targeting ER difficulties is yet to be formally evaluated and may be facilitated by the use of the DERS-16.

| Limitations and directions for future research
The present study has a number of limitations. First, the evaluation and research studies contributing data were conducted within a particular service context and the findings require replication before they can be considered to be generalizable outside this setting. Second, test-retest reliability and discriminant validity were not evaluated, and the assessment of construct validity was limited to two measures. Third, although CFA confirmed the original five-factor model, further research is needed to ascertain the most appropriate factor structure of DERS-16. This could also include conducting a factor analysis on the original 36-item DERS in a sample of individuals with psychosis, which was not possible in the present study due to the sample size. It should also be noted that the analyses comparing participants who completed both versions of the DERS were based on a small sample and the reported findings need replication with a larger sample. Our sample size is comparable to other studies in this clinical population, and, while it may be considered small for CFA, should nevertheless provide stable results, particularly as the sample size required for robust CFA and measurement invariance estimation is also subject to the strength of the associations between and within items and factors (Wolf et al., 2013). Finally, the present analyses did not include a comparison group. Further research is needed to investigate how the ER difficulties reported by individuals with psychosis compared to those reported by nonclinical controls and other clinical groups.
LAWLOR ET AL.

| 11
Qualitative investigations of service users' views could be valuable to identify the aspects of ER that constitute treatment priorities and to inform the acceptability and clinical utility of the DERS-16 as an assessment measure.
While the validation of state-level measures of ER difficulties (e.g., the State Difficulties in Emotion Regulation Scale; Lavender, Tull, et al., 2015) would also be useful to clarify and clinically target day-to-day and moment-to-moment changes in ER difficulties, the current study provides preliminary evidence of the utility of the DERS-16 for use in routine clinical settings.

| CONCLUSIONS
To our knowledge, this is the first examination of the factor structure and psychometric properties of the DERS-16 in a sample of individuals with psychosis. The findings suggest that the DERS-16 represents a reliable and valid tool for assessing overall and specific ER difficulties and provides preliminary evidence of its clinical and research utility. Predictive validity and associations with clinically relevant outcomes should be the focus of subsequent research, as well as replication in other contexts.

PEER REVIEW STATEMENT
The peer review history for this article is available at https://publons.com/publon/10.1002/jclp.23164

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.